Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Generic  process - grouping  system . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Based  originally  on  the  cpuset  system ,  extracted  by  Paul  Menage 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2006  Google ,  Inc 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *   Notifications  support 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2009  Nokia  Corporation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Author :  Kirill  A .  Shutemov 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *   Copyright  notices  from  the  original  cpuset  code : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2003  BULL  SA . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2004 - 2006  Silicon  Graphics ,  Inc . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Portions  derived  from  Patrick  Mochel ' s  sysfs  code . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   sysfs  is  Copyright  ( c )  2001 - 3  Patrick  Mochel 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2003 - 10 - 10  Written  by  Simon  Derr . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2003 - 10 - 22  Updates  by  Stephen  Hemminger . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2004  May - July  Rework  by  Paul  Jackson . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   This  file  is  subject  to  the  terms  and  conditions  of  the  GNU  General  Public 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   License .   See  the  file  COPYING  in  the  main  directory  of  the  Linux 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   distribution  for  more  details . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroup.h> 
  
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/cred.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/ctype.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/errno.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/fs.h> 
  
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/init_task.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/kernel.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/list.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mm.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mutex.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mount.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/pagemap.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/proc_fs.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/rcupdate.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/sched.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/backing-dev.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/seq_file.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/slab.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/magic.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/spinlock.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/string.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/sort.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/kmod.h> 
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/module.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/delayacct.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroupstats.h> 
  
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/hashtable.h> 
  
						 
					
						
							
								
									
										
										
										
											2008-07-26 03:46:43 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/namei.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/pid_namespace.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/idr.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/vmalloc.h> /* TODO: replace with more sophisticated array */ 
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/eventfd.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/poll.h> 
  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/flex_array.h> /* used in cgroup_attach_proc */ 
  
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/kthread.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 16:09:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/atomic.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* css deactivation bias, makes css->refcnt negative to deny new trygets */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define CSS_DEACT_BIAS		INT_MIN 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex  is  the  master  lock .   Any  modification  to  cgroup  or  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchy  must  be  performed  while  holding  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_root_mutex  nests  inside  cgroup_mutex  and  should  be  held  to  modify 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroupfs_root  of  any  cgroup  hierarchy  -  subsys  list ,  flags , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release_agent_path  and  so  on .   Modifying  requires  both  cgroup_mutex  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_root_mutex .   Readers  can  acquire  either  of  the  two .   This  is  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  break  the  following  locking  order  cycle . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   A .  cgroup_mutex  - >  cred_guard_mutex  - >  s_type - > i_mutex_key  - >  namespace_sem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   B .  namespace_sem  - >  cgroup_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  B  happens  only  through  cgroup_show_options ( )  and  using  cgroup_root_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  breaks  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_MUTEX ( cgroup_mutex ) ;  
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_MUTEX ( cgroup_root_mutex ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Generate  an  array  of  cgroup  subsystem  pointers .  At  boot  time ,  this  is 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  populated  with  the  built  in  subsystems ,  and  modular  subsystems  are 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  registered  after  that .  The  mutable  section  of  this  array  is  protected  by 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-12 16:12:06 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define SUBSYS(_x) [_x ## _subsys_id] = &_x ## _subsys, 
  
						 
					
						
							
								
									
										
										
										
											2012-09-12 16:12:05 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option) 
  
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_subsys  * subsys [ CGROUP_SUBSYS_COUNT ]  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroup_subsys.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define MAX_CGROUP_ROOT_NAMELEN 64 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  cgroupfs_root  represents  the  root  of  a  cgroup  hierarchy , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  and  may  be  associated  with  a  superblock  to  form  an  active 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroupfs_root  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  The  bitmask  of  subsystems  intended  to  be  attached  to  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Unique id for this hierarchy. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  hierarchy_id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* The bitmask of subsystems currently attached to this hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  actual_subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* A list running through the attached subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  subsys_list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* The root cgroup for this hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  top_cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Tracks how many cgroups are currently defined in hierarchy.*/ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  number_of_cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* A list running through the active hierarchies */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  list_head  root_list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* All cgroups on this root, cgroup_mutex protected */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  allcg_list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Hierarchy-specific flags */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  long  flags ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* IDs for cgroups in this hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  ida  cgroup_ida ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* The path to use for release notifications. */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  release_agent_path [ PATH_MAX ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* The name for this hierarchy - may be empty */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  name [ MAX_CGROUP_ROOT_NAMELEN ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  " rootnode "  hierarchy  is  the  " dummy hierarchy " ,  reserved  for  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subsystems  that  are  otherwise  unattached  -  it  never  has  more  than  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  single  cgroup ,  and  all  tasks  are  part  of  that  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroupfs_root  rootnode ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroupfs  file  entry ,  pointed  to  from  leaf  dentry - > d_fsdata . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cfent  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head 		node ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  dentry 			* dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype 			* type ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  CSS  ID  - -  ID  per  subsys ' s  Cgroup  Subsys  State ( CSS ) .  used  only  when 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_subsys - > use_id  ! =  0. 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define CSS_ID_MAX	(65535) 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  css_id  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  The  css  to  which  this  ID  points .  This  pointer  is  set  to  valid  value 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  after  cgroup  is  populated .  If  cgroup  is  removed ,  this  will  be  NULL . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  This  pointer  is  expected  to  be  RCU - safe  because  destroy ( ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  is  called  after  synchronize_rcu ( ) .  But  for  safe  use ,  css_tryget ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  should  be  used  for  avoiding  race . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-02-24 19:41:39 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  __rcu  * css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  ID  of  this  css . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  short  id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Depth  in  hierarchy  which  this  ID  belongs  to . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  short  depth ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  ID  is  freed  by  RCU .  ( and  lookup  routine  is  RCU  safe . ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  rcu_head  rcu_head ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Hierarchy  of  CSS  ID  belongs  to . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  short  stack [ 0 ] ;  /* Array of Length (depth+1) */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2011-03-30 22:57:33 -03:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_event  represents  events  which  userspace  want  to  receive . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_event  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Cgroup  which  the  event  belongs  to . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Control  file  which  the  event  associated . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  eventfd  to  signal  userspace  about  the  event . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  eventfd_ctx  * eventfd ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Each  of  these  stored  in  a  list  by  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  All  fields  below  needed  to  unregister  event  when 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  userspace  closes  eventfd . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									poll_table  pt ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									wait_queue_head_t  * wqh ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									wait_queue_t  wait ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  work_struct  remove ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* The list of hierarchy roots */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  LIST_HEAD ( roots ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  root_count ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_IDA ( hierarchy_ida ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  next_hierarchy_id ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  DEFINE_SPINLOCK ( hierarchy_id_lock ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* dummytop is a shorthand for the dummy hierarchy's top cgroup */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define dummytop (&rootnode.top_cgroup) 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* This flag indicates whether tasks in the fork and exit paths should
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  check  for  fork / exit  handlers  to  call .  This  avoids  us  having  to  do 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  extra  work  in  the  fork / exit  path  if  none  of  the  subsystems  need  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  be  called . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  need_forkexit_callback  __read_mostly ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_destroy_locked ( struct  cgroup  * cgrp ) ;  
						 
					
						
							
								
									
										
										
										
											2012-12-01 00:21:28 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_addrm_files ( struct  cgroup  * cgrp ,  struct  cgroup_subsys  * subsys ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      struct  cftype  cfts [ ] ,  bool  is_add ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-02-22 17:04:50 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_PROVE_LOCKING 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_lock_is_held ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  lockdep_is_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# else  /* #ifdef CONFIG_PROVE_LOCKING */ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_lock_is_held ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  mutex_is_locked ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif  /* #else #ifdef CONFIG_PROVE_LOCKING */ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_lock_is_held ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-06-14 14:55:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  css_unbias_refcnt ( int  refcnt )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  refcnt  > =  0  ?  refcnt  :  refcnt  -  CSS_DEACT_BIAS ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* the current nr of refs, always >= 0 whether @css is deactivated or not */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  css_refcnt ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  v  =  atomic_read ( & css - > refcnt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-06-14 14:55:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  css_unbias_refcnt ( v ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* convenient tests for these bits */  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								inline  int  cgroup_is_removed ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_REMOVED ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* bits in struct cgroupfs_root flags field */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								enum  {  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ROOT_NOPREFIX , 	/* mounted subsystems have no named prefix */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ROOT_XATTR , 	/* supports extended attributes */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:13:46 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_is_releasable ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									const  int  bits  = 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										( 1  < <  CGRP_RELEASABLE )  | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										( 1  < <  CGRP_NOTIFY_ON_RELEASE ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ( cgrp - > flags  &  bits )  = =  bits ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:13:46 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  notify_on_release ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for_each_subsys ( )  allows  you  to  iterate  on  each  subsystem  attached  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  an  active  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_subsys(_root, _ss) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								list_for_each_entry ( _ss ,  & _root - > subsys_list ,  sibling )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* for_each_active_root() allows you to iterate across the active hierarchies */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_active_root(_root) \ 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								list_for_each_entry ( _root ,  & roots ,  root_list )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cgroup  * __d_cgrp ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cfent  * __d_cfe ( struct  dentry  * dentry )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cftype  * __d_cft ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  __d_cfe ( dentry ) - > type ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* the list of cgroups eligible for automatic release. Protected by
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release_list_lock  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  LIST_HEAD ( release_list ) ;  
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_RAW_SPINLOCK ( release_list_lock ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_release_agent ( struct  work_struct  * work ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  DECLARE_WORK ( release_agent_work ,  cgroup_release_agent ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  check_for_release ( struct  cgroup  * cgrp ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* Link structure for associating css_set objects with cgroups */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cg_cgroup_link  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  List  running  through  cg_cgroup_links  associated  with  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup ,  anchored  on  cgroup - > css_sets 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  list_head  cgrp_link_list ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  List  running  through  cg_cgroup_links  pointing  at  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  single  css_set  object ,  anchored  on  css_set - > cg_links 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  cg_link_list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* The default css_set - used by init and its children prior to any
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchies  being  mounted .  It  contains  a  pointer  to  the  root  state 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for  each  subsystem .  Also  used  to  anchor  the  list  of  css_sets .  Not 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reference - counted ,  to  improve  performance  when  child  cgroups 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  haven ' t  been  created . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  css_set  init_css_set ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cg_cgroup_link  init_css_set_link ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_init_idr ( struct  cgroup_subsys  * ss ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   struct  cgroup_subsys_state  * css ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* css_set_lock protects the list of css_set objects, and the
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  chain  of  tasks  off  each  css_set .   Nests  outside  task - > alloc_lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  due  to  cgroup_iter_start ( )  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  DEFINE_RWLOCK ( css_set_lock ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  css_set_count ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hash  table  for  cgroup  groups .  This  improves  the  performance  to  find 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  an  existing  css_set .  This  hash  doesn ' t  ( currently )  take  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  account  cgroups  in  empty  hierarchies . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define CSS_SET_HASH_BITS	7 
  
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_HASHTABLE ( css_set_table ,  CSS_SET_HASH_BITS ) ;  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  unsigned  long  css_set_hash ( struct  cgroup_subsys_state  * css [ ] )  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key  =  0UL ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										key  + =  ( unsigned  long ) css [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									key  =  ( key  > >  16 )  ^  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* We don't maintain the lists running through each css_set to its
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task  until  after  the  first  call  to  cgroup_iter_start ( ) .  This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reduces  the  fork ( ) / exit ( )  overhead  for  people  who  have  cgroups 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  compiled  into  their  kernel  but  not  actually  in  use  */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  use_task_css_set_links  __read_mostly ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __put_css_set ( struct  css_set  * cg ,  int  taskexit )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * saved_link ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Ensure  that  the  refcount  doesn ' t  hit  zero  while  any  readers 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  can  see  it .  Similar  to  atomic_dec_and_lock ( ) ,  but  for  an 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  rwlock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( atomic_add_unless ( & cg - > refcount ,  - 1 ,  1 ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! atomic_dec_and_test ( & cg - > refcount ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* This css_set is dead. unlink it and release cgroup refcounts */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_del ( & cg - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_set_count - - ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  saved_link ,  & cg - > cg_links , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 cg_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  link - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cg_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cgrp_link_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:43:28 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  may  not  be  holding  cgroup_mutex ,  and  if  cgrp - > count  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  dropped  to  0  the  cgroup  can  be  destroyed  at  any  time ,  hence 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  rcu_read_lock  is  used  to  keep  it  alive . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( atomic_dec_and_test ( & cgrp - > count )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    notify_on_release ( cgrp ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( taskexit ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												set_bit ( CGRP_RELEASABLE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											check_for_release ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:43:28 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-15 17:53:46 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree_rcu ( cg ,  rcu_head ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  refcounted  get / put  for  css_set  objects 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  void  get_css_set ( struct  css_set  * cg )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_inc ( & cg - > refcount ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  void  put_css_set ( struct  css_set  * cg )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__put_css_set ( cg ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  void  put_css_set_taskexit ( struct  css_set  * cg )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__put_css_set ( cg ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  compare_css_sets  -  helper  function  for  find_existing_css_set ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cg :  candidate  css_set  being  tested 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ old_cg :  existing  css_set  for  a  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ new_cgrp :  cgroup  that ' s  being  entered  by  the  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ template :  desired  set  of  css  pointers  in  css_set  ( pre - calculated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  true  if  " cg "  matches  " old_cg "  except  for  the  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  which  " new_cgrp "  belongs  to ,  for  which  it  should  match  " new_cgrp " . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  bool  compare_css_sets ( struct  css_set  * cg ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											     struct  css_set  * old_cg , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											     struct  cgroup  * new_cgrp , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											     struct  cgroup_subsys_state  * template [ ] ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l1 ,  * l2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( memcmp ( template ,  cg - > subsys ,  sizeof ( cg - > subsys ) ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Not all subsystems matched */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Compare  cgroup  pointers  in  order  to  distinguish  between 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  different  cgroups  in  heirarchies  with  no  subsystems .  We 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  could  get  by  with  just  this  check  alone  ( and  skip  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  memcmp  above )  but  on  most  setups  the  memcmp  check  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  avoid  the  need  for  this  more  expensive  check  on  almost  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  candidates . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l1  =  & cg - > cg_links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l2  =  & old_cg - > cg_links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( 1 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cg_cgroup_link  * cgl1 ,  * cgl2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * cg1 ,  * cg2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l1  =  l1 - > next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l2  =  l2 - > next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* See if we reached the end - both lists are equal length. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( l1  = =  & cg - > cg_links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( l2  ! =  & old_cg - > cg_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( l2  = =  & old_cg - > cg_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Locate the cgroups associated with these links. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgl1  =  list_entry ( l1 ,  struct  cg_cgroup_link ,  cg_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgl2  =  list_entry ( l2 ,  struct  cg_cgroup_link ,  cg_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cg1  =  cgl1 - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cg2  =  cgl2 - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Hierarchies should be linked in the same order. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( cg1 - > root  ! =  cg2 - > root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  If  this  hierarchy  is  the  hierarchy  of  the  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  changing ,  then  we  need  to  check  that  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  css_set  points  to  the  new  cgroup ;  if  it ' s  any  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  hierarchy ,  then  this  css_set  should  point  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  same  cgroup  as  the  old  css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cg1 - > root  = =  new_cgrp - > root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( cg1  ! =  new_cgrp ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( cg1  ! =  cg2 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find_existing_css_set ( )  is  a  helper  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find_css_set ( ) ,  and  checks  to  see  whether  an  existing 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_set  is  suitable . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  oldcg :  the  cgroup  group  that  we ' re  using  before  the  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  transition 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgrp :  the  cgroup  that  we ' re  moving  into 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  template :  location  in  which  to  build  the  desired  set  of  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  state  objects  for  the  new  cgroup  group 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  css_set  * find_existing_css_set (  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * oldcg , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * template [ ] ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  cgrp - > root ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Build  the  set  of  subsystem  state  objects  that  we  want  to  see  in  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  new  css_set .  while  subsystems  can  change  globally ,  the  entries  here 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  won ' t  change ,  so  no  need  for  locking . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( root - > subsys_mask  &  ( 1UL  < <  i ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Subsystem is in this hierarchy. So we want
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  the  subsystem  state  from  the  new 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  cgroup  */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											template [ i ]  =  cgrp - > subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Subsystem is not in this hierarchy, so we
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  don ' t  want  to  change  the  subsystem  state  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											template [ i ]  =  oldcg - > subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									key  =  css_set_hash ( template ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_for_each_possible ( css_set_table ,  cg ,  hlist ,  key )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! compare_css_sets ( cg ,  oldcg ,  cgrp ,  template ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* This css_set matches what we need */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* No existing cgroup group matched */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  free_cg_links ( struct  list_head  * tmp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * saved_link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  saved_link ,  tmp ,  cgrp_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cgrp_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  allocate_cg_links ( )  allocates  " count "  cg_cgroup_link  structures 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  and  chains  them  on  tmp  through  their  cgrp_link_list  fields .  Returns  0  on 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  success  or  a  negative  error 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  allocate_cg_links ( int  count ,  struct  list_head  * tmp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( tmp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  count ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										link  =  kmalloc ( sizeof ( * link ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											free_cg_links ( tmp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_add ( & link - > cgrp_link_list ,  tmp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  link_css_set  -  a  helper  function  to  link  a  css_set  to  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tmp_cg_links :  cg_cgroup_link  objects  allocated  by  allocate_cg_links ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cg :  the  css_set  to  be  linked 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  destination  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  link_css_set ( struct  list_head  * tmp_cg_links ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 struct  css_set  * cg ,  struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( list_empty ( tmp_cg_links ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									link  =  list_first_entry ( tmp_cg_links ,  struct  cg_cgroup_link , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												cgrp_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									link - > cg  =  cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									link - > cgrp  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_inc ( & cgrp - > count ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_move ( & link - > cgrp_link_list ,  & cgrp - > css_sets ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Always  add  links  to  the  tail  of  the  list  so  that  the  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  sorted  by  order  of  hierarchy  creation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail ( & link - > cg_link_list ,  & cg - > cg_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find_css_set ( )  takes  an  existing  cgroup  group  and  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup  object ,  and  returns  a  css_set  object  that ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  equivalent  to  the  old  group ,  but  with  the  given  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  substituted  into  the  appropriate  hierarchy .  Must  be  called  with 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex  held 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  css_set  * find_css_set (  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * oldcg ,  struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * template [ CGROUP_SUBSYS_COUNT ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  tmp_cg_links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* First see if we already have a cgroup group that matches
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  desired  set  */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									res  =  find_existing_css_set ( oldcg ,  cgrp ,  template ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( res ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										get_css_set ( res ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( res ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									res  =  kmalloc ( sizeof ( * res ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! res ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Allocate all the cg_cgroup_link objects that we'll need */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( allocate_cg_links ( root_count ,  & tmp_cg_links )  <  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( res ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_set ( & res - > refcount ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & res - > cg_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & res - > tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_HLIST_NODE ( & res - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Copy the set of subsystem state objects generated in
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  find_existing_css_set ( )  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									memcpy ( res - > subsys ,  template ,  sizeof ( res - > subsys ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Add reference counts and links from the new css_set. */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & oldcg - > cg_links ,  cg_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( c - > root  = =  cgrp - > root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											c  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										link_css_set ( & tmp_cg_links ,  res ,  c ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & tmp_cg_links ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css_set_count + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Add this cgroup group to the hash table */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									key  =  css_set_hash ( res - > subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									hash_add ( css_set_table ,  & res - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  res ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  cgroup  for  " task "  from  the  given  hierarchy .  Must  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  called  with  cgroup_mutex  held . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup  * task_cgroup_from_root ( struct  task_struct  * task ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													    struct  cgroupfs_root  * root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * res  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  No  need  to  lock  the  task  -  since  we  hold  cgroup_mutex  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  task  can ' t  change  groups ,  so  the  only  thing  that  can  happen 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  that  it  exits  and  its  css  is  set  back  to  init_css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css  =  task - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( css  = =  & init_css_set )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										res  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( link ,  & css - > cg_links ,  cg_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( c - > root  = =  root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												res  =  c ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! res ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  There  is  one  global  cgroup  mutex .  We  also  require  taking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task_lock ( )  when  dereferencing  a  task ' s  cgroup  subsys  pointers . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  See  " The task_lock() exception " ,  at  the  end  of  this  comment . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  task  must  hold  cgroup_mutex  to  modify  cgroups . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Any  task  can  increment  and  decrement  the  count  field  without  lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  So  in  general ,  code  holding  cgroup_mutex  can ' t  rely  on  the  count 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  field  not  changing .   However ,  if  the  count  goes  to  zero ,  then  only 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task ( )  can  increment  it  again .   Because  a  count  of  zero 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  means  that  no  tasks  are  currently  attached ,  therefore  there  is  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  way  a  task  attached  to  that  cgroup  can  fork  ( the  other  way  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  increment  the  count ) .   So  code  holding  cgroup_mutex  can  safely 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  assume  that  if  the  count  is  zero ,  it  will  stay  zero .  Similarly ,  if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  a  task  holds  cgroup_mutex  on  a  cgroup  with  zero  count ,  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  knows  that  the  cgroup  won ' t  be  removed ,  as  cgroup_rmdir ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  needs  that  mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  fork  and  exit  callbacks  cgroup_fork ( )  and  cgroup_exit ( ) ,  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  ( usually )  take  cgroup_mutex .   These  are  the  two  most  performance 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  critical  pieces  of  code  here .   The  exception  occurs  on  cgroup_exit ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  when  a  task  in  a  notify_on_release  cgroup  exits .   Then  cgroup_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  taken ,  and  if  the  cgroup  count  is  zero ,  a  usermode  call  made 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  to  the  release  agent  with  the  name  of  the  cgroup  ( path  relative  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  root  of  cgroup  file  system )  as  the  argument . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  cgroup  can  only  be  deleted  if  both  its  ' count '  of  using  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  zero ,  and  its  list  of  ' children '  cgroups  is  empty .   Since  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  tasks  in  the  system  use  _some_  cgroup ,  and  since  there  is  always  at 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  least  one  task  in  the  system  ( init ,  pid  = =  1 ) ,  therefore ,  top_cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  always  has  either  children  cgroups  and / or  using  tasks .   So  we  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  need  a  special  hack  to  ensure  that  top_cgroup  cannot  be  deleted . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 	The  task_lock ( )  exception 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  need  for  this  exception  arises  from  the  action  of 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task ( ) ,  which  overwrites  one  task ' s  cgroup  pointer  with 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  another .   It  does  so  using  cgroup_mutex ,  however  there  are 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  several  performance  critical  places  that  need  to  reference 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task - > cgroup  without  the  expense  of  grabbing  a  system  global 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  mutex .   Therefore  except  as  noted  below ,  when  dereferencing  or ,  as 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  in  cgroup_attach_task ( ) ,  modifying  a  task ' s  cgroup  pointer  we  use 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  task_lock ( ) ,  which  acts  on  a  spinlock  ( task - > alloc_lock )  already  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  task_struct  routinely  used  for  such  matters . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  P . S .   One  more  locking  exception .   RCU  is  used  to  guard  the 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  update  of  a  tasks  cgroup  pointer  by  cgroup_attach_task ( ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_lock  -  lock  out  any  changes  to  cgroup  structures 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_lock ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_lock ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_unlock  -  release  lock  on  cgroup  changes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Undo  the  lock  taken  in  a  previous  cgroup_lock ( )  call . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_unlock ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_unlock ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  couple  of  forward  declarations  required ,  due  to  cyclic  reference  loop : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mkdir  - >  cgroup_create  - >  cgroup_populate_dir  - > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_add_file  - >  cgroup_create_file  - >  cgroup_dir_inode_operations 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  - >  cgroup_mkdir . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:41:39 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_mkdir ( struct  inode  * dir ,  struct  dentry  * dentry ,  umode_t  mode ) ;  
						 
					
						
							
								
									
										
										
										
											2012-06-10 17:13:09 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  dentry  * cgroup_lookup ( struct  inode  * ,  struct  dentry  * ,  unsigned  int ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_rmdir ( struct  inode  * unused_dir ,  struct  dentry  * dentry ) ;  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_populate_dir ( struct  cgroup  * cgrp ,  bool  base_files ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       unsigned  long  subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_dir_inode_operations ;  
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  proc_cgroupstats_operations ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  backing_dev_info  cgroup_backing_dev_info  =  {  
						 
					
						
							
								
									
										
										
										
											2009-06-12 14:45:52 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. name 		=  " cgroup " , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-30 00:54:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. capabilities 	=  BDI_CAP_NO_ACCT_AND_WRITEBACK , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  alloc_css_id ( struct  cgroup_subsys  * ss ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup  * parent ,  struct  cgroup  * child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  inode  * cgroup_new_inode ( umode_t  mode ,  struct  super_block  * sb )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode  =  new_inode ( sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( inode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-23 11:19:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_ino  =  get_next_ino ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										inode - > i_mode  =  mode ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:12 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_uid  =  current_fsuid ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_gid  =  current_fsgid ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										inode - > i_atime  =  inode - > i_mtime  =  inode - > i_ctime  =  CURRENT_TIME ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_mapping - > backing_dev_info  =  & cgroup_backing_dev_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_free_fn ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  container_of ( work ,  struct  cgroup ,  free_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Release  the  subsystem  state  objects . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ss - > css_free ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > root - > number_of_cgroups - - ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Drop  the  active  superblock  reference  that  we  took  when  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  created  the  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( cgrp - > root - > sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  if  we ' re  getting  rid  of  the  cgroup ,  refcount  should  ensure 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that  there  are  no  pidlists  left . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & cgrp - > pidlists ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									simple_xattrs_free ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ida_simple_remove ( & cgrp - > root - > cgroup_ida ,  cgrp - > id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_free_rcu ( struct  rcu_head  * head )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  container_of ( head ,  struct  cgroup ,  rcu_head ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									schedule_work ( & cgrp - > free_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_diput ( struct  dentry  * dentry ,  struct  inode  * inode )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* is dentry a directory ? if so, kfree() associated cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( inode - > i_mode ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! ( cgroup_is_removed ( cgrp ) ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										call_rcu ( & cgrp - > rcu_head ,  cgroup_free_rcu ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cfent  * cfe  =  __d_cfe ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  dentry - > d_parent - > d_fsdata ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cftype  * cft  =  cfe - > type ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										WARN_ONCE ( ! list_empty ( & cfe - > node )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											  cgrp  ! =  & cgrp - > root - > top_cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											  " cfe still linked for %s \n " ,  cfe - > type - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( cfe ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										simple_xattrs_free ( & cft - > xattrs ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									iput ( inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-01-14 05:31:45 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_delete ( const  struct  dentry  * d )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  remove_dir ( struct  dentry  * d )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  dentry  * parent  =  dget ( d - > d_parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									d_delete ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									simple_rmdir ( parent - > d_inode ,  d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_rm_file ( struct  cgroup  * cgrp ,  const  struct  cftype  * cft )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cfent  * cfe ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  we ' re  doing  cleanup  due  to  failure  of  cgroup_create ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  corresponding  @ cfe  may  not  exist . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( cfe ,  & cgrp - > files ,  node )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  dentry  * d  =  cfe - > dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cft  & &  cfe - > type  ! =  cft ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dget ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										d_delete ( d ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-07-03 10:38:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										simple_unlink ( cgrp - > dentry - > d_inode ,  d ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & cfe - > node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dput ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_clear_directory  -  selective  removal  of  base  and  subsystem  files 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dir :  directory  containing  the  files 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ base_files :  true  if  the  base  files  should  be  removed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ subsys_mask :  mask  of  the  subsystem  ids  whose  files  should  be  removed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_clear_directory ( struct  dentry  * dir ,  bool  base_files ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   unsigned  long  subsys_mask ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( dir ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cftype_set  * set ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! test_bit ( ss - > subsys_id ,  & subsys_mask ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( set ,  & ss - > cftsets ,  node ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-12-01 00:21:28 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_addrm_files ( cgrp ,  NULL ,  set - > cfts ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( base_files )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										while  ( ! list_empty ( & cgrp - > files ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_rm_file ( cgrp ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  NOTE  :  the  dentry  must  have  been  dget ( ) ' ed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_d_remove_dir ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * parent ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  dentry - > d_sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_clear_directory ( dentry ,  true ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									parent  =  dentry - > d_parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & parent - > d_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-14 11:34:34 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock_nested ( & dentry - > d_lock ,  DENTRY_D_LOCK_NESTED ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									list_del_init ( & dentry - > d_u . d_child ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_unlock ( & dentry - > d_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & parent - > d_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									remove_dir ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Call  with  cgroup_mutex  held .  Drops  reference  counts  on  modules ,  including 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  any  duplicate  ones  that  parse_cgroupfs_options  took .  If  this  function 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  an  error ,  no  reference  counts  are  touched . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  rebind_subsystems ( struct  cgroupfs_root  * root ,  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											      unsigned  long  final_subsys_mask ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  added_mask ,  removed_mask ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_root_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									removed_mask  =  root - > actual_subsys_mask  &  ~ final_subsys_mask ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									added_mask  =  final_subsys_mask  &  ~ root - > actual_subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Check that any added subsystems are currently free */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! ( bit  &  added_mask ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Nobody  should  tell  us  to  do  a  subsys  that  doesn ' t  exist : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  parse_cgroupfs_options  should  catch  that  case  and  refcounts 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  ensure  that  subsystems  won ' t  disappear  once  selected . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ss  = =  NULL ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ss - > root  ! =  & rootnode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Subsystem isn't free */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Currently we don't handle adding/removing subsystems when
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  any  child  cgroups  exist .  This  is  theoretically  supportable 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  but  involves  complex  error  handling ,  so  it ' s  being  left  until 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  later  */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-12-15 13:54:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > number_of_cgroups  >  1 ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Process each subsystem */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( bit  &  added_mask )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											/* We're binding this subsystem to this hierarchy */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( ss  = =  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( cgrp - > subsys [ i ] ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											BUG_ON ( ! dummytop - > subsys [ i ] ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( dummytop - > subsys [ i ] - > cgroup  ! =  dummytop ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgrp - > subsys [ i ]  =  dummytop - > subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgrp - > subsys [ i ] - > cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_move ( & ss - > sibling ,  & root - > subsys_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > root  =  root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											if  ( ss - > bind ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > bind ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* refcount was already taken, and we're keeping it */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  if  ( bit  &  removed_mask )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											/* We're removing this subsystem */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( ss  = =  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( cgrp - > subsys [ i ]  ! =  dummytop - > subsys [ i ] ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( cgrp - > subsys [ i ] - > cgroup  ! =  cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											if  ( ss - > bind ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > bind ( dummytop ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											dummytop - > subsys [ i ] - > cgroup  =  dummytop ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgrp - > subsys [ i ]  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											subsys [ i ] - > root  =  & rootnode ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_move ( & ss - > sibling ,  & rootnode . subsys_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* subsystem is now free - drop reference on module */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_put ( ss - > module ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  if  ( bit  &  final_subsys_mask )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											/* Subsystem state should already exist */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( ss  = =  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( ! cgrp - > subsys [ i ] ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  a  refcount  was  taken ,  but  we  already  had  one ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  drop  the  extra  reference . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_put ( ss - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# ifdef CONFIG_MODULE_UNLOAD 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( ss - > module  & &  ! module_refcount ( ss - > module ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Subsystem state shouldn't exist */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( cgrp - > subsys [ i ] ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root - > subsys_mask  =  root - > actual_subsys_mask  =  final_subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-08 21:32:45 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_show_options ( struct  seq_file  * seq ,  struct  dentry  * dentry )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-12-08 21:32:45 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  dentry - > d_sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,%s " ,  ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( test_bit ( ROOT_NOPREFIX ,  & root - > flags ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,noprefix " ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( test_bit ( ROOT_XATTR ,  & root - > flags ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,xattr " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( root - > release_agent_path ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,release_agent=%s " ,  root - > release_agent_path ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & root - > top_cgroup . flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,clone_children " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,name=%s " ,  root - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_sb_opts  {  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									unsigned  long  flags ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * release_agent ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									bool  cpuset_clone_children ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * name ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* User explicitly requested empty subsystem */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									bool  none ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * new_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Convert  a  hierarchy  specifier  into  a  bitmask  of  subsystems  and  flags .  Call 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  with  cgroup_mutex  held  to  protect  the  subsys [ ]  array .  This  function  takes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  refcounts  on  subsystems  to  be  used ,  unless  it  returns  error ,  in  which  case 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  no  refcounts  are  taken . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  parse_cgroupfs_options ( char  * data ,  struct  cgroup_sb_opts  * opts )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * token ,  * o  =  data ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									bool  all_ss  =  false ,  one_ss  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  mask  =  ( unsigned  long ) - 1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									bool  module_pin_failed  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_CPUSETS 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mask  =  ~ ( 1UL  < <  cpuset_subsys_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									memset ( opts ,  0 ,  sizeof ( * opts ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( token  =  strsep ( & o ,  " , " ) )  ! =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! * token ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " none " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Explicitly have no subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											opts - > none  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " all " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Mutually exclusive option 'all' + subsystem name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( one_ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											all_ss  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " noprefix " ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											set_bit ( ROOT_NOPREFIX ,  & opts - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " clone_children " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > cpuset_clone_children  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " xattr " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											set_bit ( ROOT_XATTR ,  & opts - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strncmp ( token ,  " release_agent= " ,  14 ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Specifying two release agents is forbidden */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > release_agent  = 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-10 18:02:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												kstrndup ( token  +  14 ,  PATH_MAX  -  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strncmp ( token ,  " name= " ,  5 ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											const  char  * name  =  token  +  5 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Can't specify an empty name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! strlen ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Must match [\w.-]+ */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											for  ( i  =  0 ;  i  <  strlen ( name ) ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												char  c  =  name [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( isalnum ( c ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( ( c  = =  ' . ' )  | |  ( c  = =  ' - ' )  | |  ( c  = =  ' _ ' ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Specifying two names is forbidden */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											opts - > name  =  kstrndup ( name , 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-10 18:02:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													      MAX_CGROUP_ROOT_NAMELEN  -  1 , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													      GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss  = =  NULL ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( strcmp ( token ,  ss - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > disabled ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Mutually exclusive option 'all' + subsystem name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( all_ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											set_bit ( i ,  & opts - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											one_ss  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( i  = =  CGROUP_SUBSYS_COUNT ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  the  ' all '  option  was  specified  select  all  the  subsystems , 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-27 14:25:55 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  otherwise  if  ' none ' ,  ' name = '  and  a  subsystem  name  options 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  were  not  specified ,  let ' s  default  to  ' all ' 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-27 14:25:55 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( all_ss  | |  ( ! one_ss  & &  ! opts - > none  & &  ! opts - > name ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss  = =  NULL ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > disabled ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											set_bit ( i ,  & opts - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Consistency checks */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Option  noprefix  was  introduced  just  for  backward  compatibility 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  with  the  old  cpuset ,  so  we  allow  noprefix  only  if  mounting  just 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  cpuset  subsystem . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( test_bit ( ROOT_NOPREFIX ,  & opts - > flags )  & & 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    ( opts - > subsys_mask  &  mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Can't specify "none" and some subsystems */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > subsys_mask  & &  opts - > none ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  either  have  to  specify  by  name  or  by  subsystems .  ( So  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  empty  hierarchies  must  have  a  name ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! opts - > subsys_mask  & &  ! opts - > name ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Grab  references  on  all  the  modules  we ' ll  need ,  so  the  subsystems 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  don ' t  dance  around  before  rebind_subsystems  attaches  them .  This  may 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  take  duplicate  reference  counts  on  a  subsystem  that ' s  already  used , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  but  rebind_subsystems  handles  this  case . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! ( bit  &  opts - > subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! try_module_get ( subsys [ i ] - > module ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_pin_failed  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( module_pin_failed )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  oops ,  one  of  the  modules  was  going  away .  this  means  that  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  raced  with  a  module_delete  call ,  and  to  the  user  this  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  essentially  a  " subsystem doesn't exist "  case . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for  ( i - - ;  i  > =  0 ;  i - - )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* drop refcounts only on the ones we took */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! ( bit  &  opts - > subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_put ( subsys [ i ] - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  drop_parsed_module_refcounts ( unsigned  long  subsys_mask )  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! ( bit  &  subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										module_put ( subsys [ i ] - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_remount ( struct  super_block  * sb ,  int  * flags ,  char  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  opts ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  added_mask ,  removed_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* See what subsystems are wanted */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  parse_cgroupfs_options ( data ,  & opts ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts . subsys_mask  ! =  root - > actual_subsys_mask  | |  opts . release_agent ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pr_warning ( " cgroup: option changes via remount are deprecated (pid=%d comm=%s) \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   task_tgid_nr ( current ) ,  current - > comm ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									added_mask  =  opts . subsys_mask  &  ~ root - > subsys_mask ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									removed_mask  =  root - > subsys_mask  &  ~ opts . subsys_mask ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Don't allow flags or name to change at remount */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts . flags  ! =  root - > flags  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ( opts . name  & &  strcmp ( opts . name ,  root - > name ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										drop_parsed_module_refcounts ( opts . subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-12-03 09:28:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Clear  out  the  files  of  subsystems  that  should  be  removed ,  do 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  before  rebind_subsystems ,  since  rebind_subsystems  may 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  change  this  hierarchy ' s  subsys_list . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_clear_directory ( cgrp - > dentry ,  false ,  removed_mask ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  rebind_subsystems ( root ,  opts . subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-12-03 09:28:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* rebind_subsystems failed, re-populate the removed files */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_populate_dir ( cgrp ,  false ,  removed_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										drop_parsed_module_refcounts ( opts . subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: don't change release_agent when remount failed
Remount can fail in either case:
  - wrong mount options is specified, or option 'noprefix' is changed.
  - a to-be-added subsys is already mounted/active.
When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:
 # mount -t cgroup -o cpu xxx /cgrp1
 # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
 # cat /cgrp2/release_agent
 agent1
 # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 not mounted already, or bad option
 # cat /cgrp2/release_agent
 agent1     <-- ok
 # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 is busy
 # cat /cgrp2/release_agent
 agent2     <-- unexpected!
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* re-populate subsystem files */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_populate_dir ( cgrp ,  false ,  added_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts . release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > release_agent_path ,  opts . release_agent ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 out_unlock : 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:09 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  super_operations  cgroup_ops  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. statfs  =  simple_statfs , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. drop_inode  =  generic_delete_inode , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. show_options  =  cgroup_show_options , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. remount_fs  =  cgroup_remount , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  init_cgroup_housekeeping ( struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > children ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > files ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > css_sets ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:35 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > allcg_node ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > pidlists ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_WORK ( & cgrp - > free_work ,  cgroup_free_fn ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_init ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > event_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock_init ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									simple_xattrs_init ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  init_cgroup_root ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & root - > subsys_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & root - > root_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & root - > allcg_list ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									root - > number_of_cgroups  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > root  =  root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > top_cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_housekeeping ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-28 17:15:21 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add_tail ( & cgrp - > allcg_node ,  & root - > allcg_list ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  bool  init_root_id ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ida_pre_get ( & hierarchy_ida ,  GFP_KERNEL ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										spin_lock ( & hierarchy_id_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Try to allocate the next unused ID */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  ida_get_new_above ( & hierarchy_ida ,  next_hierarchy_id , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													& root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ret  = =  - ENOSPC ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Try again starting from 0 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ret  =  ida_get_new ( & hierarchy_ida ,  & root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ret )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											next_hierarchy_id  =  root - > hierarchy_id  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  if  ( ret  ! =  - EAGAIN )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Can only get here if the 31-bit IDR is full ... */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										spin_unlock ( & hierarchy_id_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_test_super ( struct  super_block  * sb ,  void  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  * opts  =  data ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* If we asked for a name then it must match */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts - > name  & &  strcmp ( opts - > name ,  root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  we  asked  for  subsystems  ( or  explicitly  for  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  subsystems )  then  they  must  match 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ( opts - > subsys_mask  | |  opts - > none ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    & &  ( opts - > subsys_mask  ! =  root - > subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroupfs_root  * cgroup_root_from_opts ( struct  cgroup_sb_opts  * opts )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! opts - > subsys_mask  & &  ! opts - > none ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root  =  kzalloc ( sizeof ( * root ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! init_root_id ( root ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_root ( root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root - > subsys_mask  =  opts - > subsys_mask ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root - > flags  =  opts - > flags ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ida_init ( & root - > cgroup_ida ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > release_agent_path ,  opts - > release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > name ,  opts - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > cpuset_clone_children ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & root - > top_cgroup . flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_drop_root ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & hierarchy_id_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ida_remove ( & hierarchy_ida ,  root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & hierarchy_id_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ida_destroy ( & root - > cgroup_ida ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_set_super ( struct  super_block  * sb ,  void  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  * opts  =  data ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If we don't have a new root, we can't set up a new sb */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! opts - > new_root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! opts - > subsys_mask  & &  ! opts - > none ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  set_anon_super ( sb ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb - > s_fs_info  =  opts - > new_root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									opts - > new_root - > sb  =  sb ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_blocksize  =  PAGE_CACHE_SIZE ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_blocksize_bits  =  PAGE_CACHE_SHIFT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_magic  =  CGROUP_SUPER_MAGIC ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_op  =  & cgroup_ops ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_get_rootdir ( struct  super_block  * sb )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									static  const  struct  dentry_operations  cgroup_dops  =  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. d_iput  =  cgroup_diput , 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-14 05:31:45 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. d_delete  =  cgroup_delete , 
							 
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_new_inode ( S_IFDIR  |  S_IRUGO  |  S_IXUGO  |  S_IWUSR ,  sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode - > i_fop  =  & simple_dir_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode - > i_op  =  & cgroup_dir_inode_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* directories start off with i_nlink == 2 (for "." entry) */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inc_nlink ( inode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-08 22:15:13 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb - > s_root  =  d_make_root ( inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! sb - > s_root ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* for everything else we want ->d_op set */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_d_op  =  & cgroup_dops ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  dentry  * cgroup_mount ( struct  file_system_type  * fs_type ,  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											 int  flags ,  const  char  * unused_dev_name , 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											 void  * data ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  opts ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * new_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* First find the desired set of subsystems */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									ret  =  parse_cgroupfs_options ( data ,  & opts ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Allocate  a  new  cgroup  root .  We  may  not  need  it  if  we ' re 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  reusing  an  existing  hierarchy . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									new_root  =  cgroup_root_from_opts ( & opts ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( new_root ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( new_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  drop_modules ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									opts . new_root  =  new_root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Locate an existing or new sb for this hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-06-25 12:55:37 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb  =  sget ( fs_type ,  cgroup_test_super ,  cgroup_set_super ,  0 ,  & opts ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( sb ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_drop_root ( opts . new_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  drop_modules ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( root  = =  opts . new_root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* We used the new root structure, so this is a new hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  list_head  tmp_cg_links ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * root_cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroupfs_root  * existing_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										const  struct  cred  * cred ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  css_set  * cg ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( sb - > s_root  ! =  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  cgroup_get_rootdir ( sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  drop_new_super ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode  =  sb - > s_root - > d_inode ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_lock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Check for name clashes with existing mounts */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											for_each_active_root ( existing_root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( ! strcmp ( existing_root - > name ,  root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													goto  unlock_drop ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We ' re  accessing  css_set_count  without  locking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  css_set_lock  here ,  but  that ' s  OK  -  it  can  only  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  increased  by  someone  holding  cgroup_lock ,  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  us .  The  worst  that  can  happen  is  that  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  have  some  link  structures  left  over 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  allocate_cg_links ( css_set_count ,  & tmp_cg_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  unlock_drop ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  rebind_subsystems ( root ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ret  = =  - EBUSY )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											free_cg_links ( & tmp_cg_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  unlock_drop ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  There  must  be  no  failure  case  after  here ,  since  rebinding 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  takes  care  of  subsystems '  refcounts ,  which  are  explicitly 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  dropped  in  the  failure  exit  path . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* EBUSY should be the only error here */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_add ( & root - > root_list ,  & roots ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										root_count + + ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										sb - > s_root - > d_fsdata  =  root_cgrp ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										root - > top_cgroup . dentry  =  sb - > s_root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Link the top cgroup in this hierarchy into all
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  the  css_set  objects  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_for_each ( css_set_table ,  i ,  cg ,  hlist ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											link_css_set ( & tmp_cg_links ,  cg ,  root_cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										free_cg_links ( & tmp_cg_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! list_empty ( & root_cgrp - > children ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										BUG_ON ( root - > number_of_cgroups  ! =  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cred  =  override_creds ( & init_cred ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_populate_dir ( root_cgrp ,  true ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										revert_creds ( cred ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  re - used  an  existing  hierarchy  -  the  new  root  ( if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  any )  is  not  needed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_drop_root ( opts . new_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* no subsys rebinding, so refcounts don't change */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										drop_parsed_module_refcounts ( opts . subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  dget ( sb - > s_root ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 unlock_drop : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 drop_new_super : 
							 
						 
					
						
							
								
									
										
										
										
											2009-05-06 01:34:22 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									deactivate_locked_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 drop_modules : 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									drop_parsed_module_refcounts ( opts . subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 out_err : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ERR_PTR ( ret ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_kill_sb ( struct  super_block  * sb )  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * saved_link ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( root - > number_of_cgroups  ! =  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & cgrp - > children ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Rebind all subsystems back to the default hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  rebind_subsystems ( root ,  0 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Shouldn't be able to fail ... */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Release  all  the  links  from  css_sets  to  this  hierarchy ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  root  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  saved_link ,  & cgrp - > css_sets , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 cgrp_link_list )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del ( & link - > cg_link_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del ( & link - > cgrp_link_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-29 14:25:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! list_empty ( & root - > root_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & root - > root_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										root_count - - ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									simple_xattrs_free ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									kill_litter_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_drop_root ( root ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  file_system_type  cgroup_fs_type  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. name  =  " cgroup " , 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. mount  =  cgroup_mount , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. kill_sb  =  cgroup_kill_sb , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  kobject  * cgroup_kobj ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_path  -  generate  the  path  of  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buf :  the  buffer  to  write  the  path  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buflen :  the  length  of  the  buffer 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:44 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Called  with  cgroup_mutex  held  or  else  with  an  RCU - protected  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reference .   Writes  path  of  cgroup  into  buf .   Returns  0  on  success , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  - errno  on  error . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_path ( const  struct  cgroup  * cgrp ,  char  * buf ,  int  buflen )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * dentry  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									char  * start ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_lockdep_assert ( rcu_read_lock_held ( )  | |  cgroup_lock_is_held ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   " cgroup_path() called without proper locking " ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:30:22 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgrp  = =  dummytop )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Inactive  subsystems  have  no  dentry  for  their  root 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( buf ,  " / " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-08 21:36:38 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									start  =  buf  +  buflen  -  1 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-08 21:36:38 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									* start  =  ' \0 ' ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									for  ( ; ; )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:44 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  len  =  dentry - > d_name . len ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:29:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ( start  - =  len )  <  buf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENAMETOOLONG ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:29:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										memcpy ( start ,  dentry - > d_name . name ,  len ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgrp  =  cgrp - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! cgrp ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:29:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										dentry  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! cgrp - > parent ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( - - start  <  buf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENAMETOOLONG ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										* start  =  ' / ' ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									memmove ( buf ,  start ,  buf  +  buflen  -  start ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_path ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Control  Group  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  task_and_cgroup  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct 	* task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup 		* cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set 		* cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_taskset  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_and_cgroup 	single ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  flex_array 	* tc_array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int 			tc_array_len ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int 			idx ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup 		* cur_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_first  -  reset  taskset  and  return  the  first  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset  iteration  is  initialized  and  the  first  task  is  returned . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  task_struct  * cgroup_taskset_first ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( tset - > tc_array )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tset - > idx  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_taskset_next ( tset ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tset - > cur_cgrp  =  tset - > single . cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  tset - > single . task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_first ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_next  -  iterate  to  the  next  task  in  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  next  task  in  @ tset .   Iteration  must  have  been  initialized 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  with  cgroup_taskset_first ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  task_struct  * cgroup_taskset_next ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_and_cgroup  * tc ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! tset - > tc_array  | |  tset - > idx  > =  tset - > tc_array_len ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tc  =  flex_array_get ( tset - > tc_array ,  tset - > idx + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tset - > cur_cgrp  =  tc - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  tc - > task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_next ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_cur_cgroup  -  return  the  matching  cgroup  for  the  current  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  cgroup  for  the  current  ( last  returned )  task  of  @ tset .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  must  be  preceded  by  either  cgroup_taskset_first ( )  or 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_next ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup  * cgroup_taskset_cur_cgroup ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  tset - > cur_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_cur_cgroup ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_size  -  return  the  number  of  tasks  in  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_taskset_size ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  tset - > tc_array  ?  tset - > tc_array_len  :  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_size ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_task_migrate  -  move  a  task  from  one  cgroup  to  another . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Must  be  called  with  cgroup_mutex  and  threadgroup  locked . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_task_migrate ( struct  cgroup  * cgrp ,  struct  cgroup  * oldcgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  task_struct  * tsk ,  struct  css_set  * newcg ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * oldcg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:35 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  We  are  synchronized  through  threadgroup_lock ( )  against  PF_EXITING 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  setting  such  that  we  can ' t  race  against  cgroup_exit ( )  changing  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set  to  init_css_set  and  dropping  the  old  one . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:03:18 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									WARN_ON_ONCE ( tsk - > flags  &  PF_EXITING ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									oldcg  =  tsk - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									task_lock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_assign_pointer ( tsk - > cgroups ,  newcg ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									task_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Update the css_set linked lists if we're using them */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! list_empty ( & tsk - > cg_list ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_move ( & tsk - > cg_list ,  & newcg - > tasks ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  just  gained  a  reference  on  oldcg  by  taking  it  from  the  task .  As 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  trading  it  for  newcg  is  protected  by  cgroup_mutex ,  we ' re  safe  to  drop 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  it  here ;  it  will  be  freed  under  RCU . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									set_bit ( CGRP_RELEASABLE ,  & oldcgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-04 16:37:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									put_css_set ( oldcg ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_attach_task  -  attach  task  ' tsk '  to  cgroup  ' cgrp ' 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  the  task  is  attaching  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tsk :  the  task  to  be  attached 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Call  with  cgroup_mutex  and  threadgroup  locked .  May  take  task_lock  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tsk  during  call . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_attach_task ( struct  cgroup  * cgrp ,  struct  task_struct  * tsk )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-03-29 22:03:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:03 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ,  * failed_ss  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * oldcgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  cgrp - > root ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_taskset  tset  =  {  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * newcg ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* @tsk either already exited or can't exit until the end */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( tsk - > flags  &  PF_EXITING ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ESRCH ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Nothing to do if the task is already in that cgroup */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									oldcgrp  =  task_cgroup_from_root ( tsk ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgrp  = =  oldcgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									tset . single . task  =  tsk ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tset . single . cgrp  =  oldcgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > can_attach )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											retval  =  ss - > can_attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:03 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  Remember  on  which  subsystem  the  can_attach ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  failed ,  so  that  we  only  call  cancel_attach ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  against  the  subsystems  whose  can_attach ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  succeeded .  ( See  below ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												failed_ss  =  ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									newcg  =  find_css_set ( tsk - > cgroups ,  cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! newcg )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:03 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_task_migrate ( cgrp ,  oldcgrp ,  tsk ,  newcg ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:13:44 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:03 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss  = =  failed_ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  This  subsystem  was  the  one  that  failed  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  can_attach ( )  check  earlier ,  so  we  don ' t  need 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  to  call  cancel_attach ( )  against  it  or  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 *  remaining  subsystems . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > cancel_attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > cancel_attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:03 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-05-30 22:24:39 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2010-09-09 16:37:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task_all  -  attach  task  ' tsk '  to  all  cgroups  of  task  ' from ' 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ from :  attach  to  all  cgroups  of  a  given  task 
							 
						 
					
						
							
								
									
										
										
										
											2010-05-30 22:24:39 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ tsk :  the  task  to  be  attached 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-09-09 16:37:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_attach_task_all ( struct  task_struct  * from ,  struct  task_struct  * tsk )  
						 
					
						
							
								
									
										
										
										
											2010-05-30 22:24:39 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_lock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_active_root ( root )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-09-09 16:37:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * from_cg  =  task_cgroup_from_root ( from ,  root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  cgroup_attach_task ( from_cg ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-05-30 22:24:39 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-09-09 16:37:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_attach_task_all ) ;  
						 
					
						
							
								
									
										
										
										
											2010-05-30 22:24:39 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_attach_proc  -  attach  all  threads  in  a  threadgroup  to  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  attach  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ leader :  the  threadgroup  leader  task_struct  of  the  group  to  be  attached 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Call  holding  cgroup_mutex  and  the  group_rwsem  of  the  leader .  Will  take 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task_lock  of  each  thread  in  leader ' s  threadgroup  individually  in  turn . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-27 07:46:25 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_attach_proc ( struct  cgroup  * cgrp ,  struct  task_struct  * leader )  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ,  i ,  group_size ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ,  * failed_ss  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* guaranteed to be initialized later, but the compiler needs this */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  cgrp - > root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* threadgroup list cursor and array */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_and_cgroup  * tc ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  flex_array  * group ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_taskset  tset  =  {  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  0 :  in  order  to  do  expensive ,  possibly  blocking  operations  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  every  thread ,  we  cannot  iterate  the  thread  group  list ,  since  it  needs 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  rcu  or  tasklist  locked .  instead ,  build  an  array  of  all  threads  in  the 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  group  -  group_rwsem  prevents  new  threads  from  appearing ,  and  if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  threads  exit ,  this  will  just  be  an  over - estimate . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									group_size  =  get_nr_threads ( leader ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* flex_array supports very large thread-groups better than kmalloc. */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									group  =  flex_array_alloc ( sizeof ( * tc ) ,  group_size ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! group ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* pre-allocate to guarantee space while iterating in rcu read-side. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  flex_array_prealloc ( group ,  0 ,  group_size  -  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_free_group_list ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tsk  =  leader ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									i  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:31 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Prevent  freeing  of  tasks  while  we  take  a  snapshot .  Tasks  that  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  already  PF_EXITING  could  be  freed  from  underneath  us  unless  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  take  an  rcu_read_lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  task_and_cgroup  ent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* @tsk either already exited or can't exit until the end */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( tsk - > flags  &  PF_EXITING ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* as per above, nr_threads may decrease, but not increase. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( i  > =  group_size ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ent . task  =  tsk ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ent . cgrp  =  task_cgroup_from_root ( tsk ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* nothing to do if this task is already in the cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ent . cgrp  = =  cgrp ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  saying  GFP_ATOMIC  has  no  effect  here  because  we  did  prealloc 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  earlier ,  but  it ' s  good  form  to  communicate  our  expectations . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  flex_array_put ( group ,  i ,  & ent ,  GFP_ATOMIC ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( retval  ! =  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										i + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while_each_thread ( leader ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:31 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* remember the number of threads in the array for later. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									group_size  =  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									tset . tc_array  =  group ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tset . tc_array_len  =  group_size ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* methods shouldn't be called if no task is actually migrating */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! group_size ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_free_group_list ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  1 :  check  that  we  can  legitimately  attach  to  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > can_attach )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											retval  =  ss - > can_attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												failed_ss  =  ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  out_cancel_attach ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  2 :  make  sure  css_sets  exist  for  all  threads  to  be  migrated . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  use  find_css_set ,  which  allocates  a  new  one  if  necessary . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc - > cg  =  find_css_set ( tc - > task - > cgroups ,  cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! tc - > cg )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											retval  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_put_css_set_refs ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  step  3 :  now  that  we ' re  guaranteed  success  wrt  the  css_sets , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  proceed  to  move  all  tasks  to  the  new  cgroup .   There  are  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  failure  cases  after  here ,  so  this  is  the  commit  point . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_task_migrate ( cgrp ,  tc - > cgrp ,  tc - > task ,  tc - > cg ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* nothing is sensitive to fork() after this point. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  step  4 :  do  subsystem  attach  callbacks . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  5 :  success !  and  cleanup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_put_css_set_refs :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! tc - > cg ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_css_set ( tc - > cg ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_cancel_attach :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ss  = =  failed_ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > cancel_attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > cancel_attach ( cgrp ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_free_group_list :  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									flex_array_free ( group ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Find  the  task_struct  of  the  task  to  attach  by  vpid  and  pass  it  along  to  the 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  function  to  attach  either  it  or  all  tasks  in  its  threadgroup .  Will  lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex  and  threadgroup ;  may  take  task_lock  of  task . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  attach_task_by_pid ( struct  cgroup  * cgrp ,  u64  pid ,  bool  threadgroup )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									const  struct  cred  * cred  =  current_cred ( ) ,  * tcred ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								retry_find_task :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:47 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tsk  =  find_task_by_vpid ( pid ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! tsk )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret =  - ESRCH ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  even  if  we ' re  attaching  all  tasks  in  the  thread  group ,  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  only  need  to  check  permissions  on  one  of  them . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tcred  =  __task_cred ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-12 15:44:39 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! uid_eq ( cred - > euid ,  GLOBAL_ROOT_UID )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    ! uid_eq ( cred - > euid ,  tcred - > uid )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    ! uid_eq ( cred - > euid ,  tcred - > suid ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  - EACCES ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tsk  =  current ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( threadgroup ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tsk  =  tsk - > group_leader ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Workqueue  threads  may  acquire  PF_THREAD_BOUND  and  become 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  trapped  in  a  cpuset ,  or  RT  worker  may  be  born  in  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  with  no  rt_runtime  allocated .   Just  say  no . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( tsk  = =  kthreadd_task  | |  ( tsk - > flags  &  PF_THREAD_BOUND ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									get_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									threadgroup_lock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( threadgroup )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! thread_group_leader ( tsk ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  a  race  with  de_thread  from  another  thread ' s  exec ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  may  strip  us  of  our  leadership ,  if  this  happens , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  there  is  no  choice  but  to  throw  this  task  away  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  try  again ;  this  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  " double-double-toil-and-trouble-check locking " . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											threadgroup_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  retry_find_task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  cgroup_attach_proc ( cgrp ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  cgroup_attach_task ( cgrp ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									threadgroup_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_unlock_cgroup :  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_tasks_write ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  u64  pid )  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  attach_task_by_pid ( cgrp ,  pid ,  false ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_procs_write ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  u64  tgid )  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  attach_task_by_pid ( cgrp ,  tgid ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_lock_live_group  -  take  cgroup_mutex  and  check  that  cgrp  is  alive . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  be  checked  for  liveness 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  On  success ,  returns  true ;  the  lock  should  be  later  released  with 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_unlock ( ) .  On  failure  returns  false  with  no  lock  held . 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  cgroup_lock_live_group ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgroup_is_removed ( cgrp ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_lock_live_group ) ;  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_release_agent_write ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												      const  char  * buffer ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUILD_BUG_ON ( sizeof ( cgrp - > root - > release_agent_path )  <  PATH_MAX ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( buffer )  > =  PATH_MAX ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									strcpy ( cgrp - > root - > release_agent_path ,  buffer ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_release_agent_show ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  seq_file  * seq ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									seq_puts ( seq ,  cgrp - > root - > release_agent_path ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									seq_putc ( seq ,  ' \n ' ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* A buffer size big enough for numbers or short strings */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define CGROUP_LOCAL_BUFFER_SIZE 64 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_write_X64 ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												const  char  __user  * userbuf , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												size_t  nbytes ,  loff_t  * unused_ppos ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  buffer [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * end ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! nbytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  sizeof ( buffer ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - E2BIG ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( copy_from_user ( buffer ,  userbuf ,  nbytes ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EFAULT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer [ nbytes ]  =  0 ;      /* nul-terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_u64 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-10-26 16:49:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										u64  val  =  simple_strtoull ( strstrip ( buffer ) ,  & end ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( * end ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  cft - > write_u64 ( cgrp ,  cft ,  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-10-26 16:49:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										s64  val  =  simple_strtoll ( strstrip ( buffer ) ,  & end ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( * end ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  cft - > write_s64 ( cgrp ,  cft ,  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  nbytes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_write_string ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   const  char  __user  * userbuf , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   size_t  nbytes ,  loff_t  * unused_ppos ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  local_buffer [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									size_t  max_bytes  =  cft - > max_write_len ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * buffer  =  local_buffer ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! max_bytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										max_bytes  =  sizeof ( local_buffer )  -  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  max_bytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - E2BIG ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Allocate a dynamic buffer if we need one */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  sizeof ( local_buffer ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										buffer  =  kmalloc ( nbytes  +  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( buffer  = =  NULL ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( nbytes  & &  copy_from_user ( buffer ,  userbuf ,  nbytes ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  - EFAULT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer [ nbytes ]  =  0 ;      /* nul-terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-10-26 16:49:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  cft - > write_string ( cgrp ,  cft ,  strstrip ( buffer ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  nbytes ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( buffer  ! =  local_buffer ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( buffer ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_file_write ( struct  file  * file ,  const  char  __user  * buf ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( file - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgroup_is_removed ( cgrp ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cft - > write ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_u64  | |  cft - > write_s64 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_write_X64 ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_string ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_write_string ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > trigger )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  ret  =  cft - > trigger ( cgrp ,  ( unsigned  int ) cft - > private ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ret  ?  ret  :  nbytes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_read_u64 ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       char  __user  * buf ,  size_t  nbytes , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       loff_t  * ppos ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  tmp [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									u64  val  =  cft - > read_u64 ( cgrp ,  cft ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  len  =  sprintf ( tmp ,  " %llu \n " ,  ( unsigned  long  long )  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_read_from_buffer ( buf ,  nbytes ,  ppos ,  tmp ,  len ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_read_s64 ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       char  __user  * buf ,  size_t  nbytes , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       loff_t  * ppos ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  tmp [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									s64  val  =  cft - > read_s64 ( cgrp ,  cft ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  len  =  sprintf ( tmp ,  " %lld \n " ,  ( long  long )  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_read_from_buffer ( buf ,  nbytes ,  ppos ,  tmp ,  len ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_file_read ( struct  file  * file ,  char  __user  * buf ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( file - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgroup_is_removed ( cgrp ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > read ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cft - > read ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_u64 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_read_u64 ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_s64 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_read_s64 ( cgrp ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  seqfile  ops / methods  for  returning  structured  data .  Currently  just 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  supports  string - > u64  maps ,  but  can  be  extended  in  future . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_seqfile_state  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_map_add ( struct  cgroup_map_cb  * cb ,  const  char  * key ,  u64  value )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  seq_file  * sf  =  cb - > state ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  seq_printf ( sf ,  " %s %llu \n " ,  key ,  ( unsigned  long  long ) value ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_seqfile_show ( struct  seq_file  * m ,  void  * arg )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_seqfile_state  * state  =  m - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  state - > cft ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_map )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_map_cb  cb  =  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											. fill  =  cgroup_map_add , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											. state  =  m , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cft - > read_map ( state - > cgroup ,  cft ,  & cb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  cft - > read_seq_string ( state - > cgroup ,  cft ,  m ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 19:46:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_seqfile_release ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  seq_file  * seq  =  file - > private_data ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( seq - > private ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  single_release ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_seqfile_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release  =  cgroup_seqfile_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_file_open ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									err  =  generic_file_open ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_map  | |  cft - > read_seq_string )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_seqfile_state  * state  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											kzalloc ( sizeof ( * state ) ,  GFP_USER ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! state ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										state - > cft  =  cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										state - > cgroup  =  __d_cgrp ( file - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										file - > f_op  =  & cgroup_seqfile_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  single_open ( file ,  cgroup_seqfile_show ,  state ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( err  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											kfree ( state ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  if  ( cft - > open ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										err  =  cft - > open ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_file_release ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > release ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cft - > release ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_rename  -  Only  allow  simple  rename  of  directories  in  place . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_rename ( struct  inode  * old_dir ,  struct  dentry  * old_dentry ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											    struct  inode  * new_dir ,  struct  dentry  * new_dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! S_ISDIR ( old_dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOTDIR ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( new_dentry - > d_inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EEXIST ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( old_dir  ! =  new_dir ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EIO ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_rename ( old_dir ,  old_dentry ,  new_dir ,  new_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  simple_xattrs  * __d_xattrs ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  & __d_cgrp ( dentry ) - > xattrs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  & __d_cft ( dentry ) - > xattrs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  int  xattr_enabled ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  dentry - > d_sb - > s_fs_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  test_bit ( ROOT_XATTR ,  & root - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  bool  is_valid_xattr ( const  char  * name )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! strncmp ( name ,  XATTR_TRUSTED_PREFIX ,  XATTR_TRUSTED_PREFIX_LEN )  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ! strncmp ( name ,  XATTR_SECURITY_PREFIX ,  XATTR_SECURITY_PREFIX_LEN ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_setxattr ( struct  dentry  * dentry ,  const  char  * name ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   const  void  * val ,  size_t  size ,  int  flags ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_set ( __d_xattrs ( dentry ) ,  name ,  val ,  size ,  flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_removexattr ( struct  dentry  * dentry ,  const  char  * name )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_remove ( __d_xattrs ( dentry ) ,  name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_getxattr ( struct  dentry  * dentry ,  const  char  * name ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       void  * buf ,  size_t  size ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_get ( __d_xattrs ( dentry ) ,  name ,  buf ,  size ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_listxattr ( struct  dentry  * dentry ,  char  * buf ,  size_t  size )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_list ( __d_xattrs ( dentry ) ,  buf ,  size ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_file_operations  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. read  =  cgroup_file_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  generic_file_llseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. open  =  cgroup_file_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release  =  cgroup_file_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_file_inode_operations  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. setxattr  =  cgroup_setxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. getxattr  =  cgroup_getxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. listxattr  =  cgroup_listxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. removexattr  =  cgroup_removexattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_dir_inode_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2011-01-14 05:31:45 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. lookup  =  cgroup_lookup , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. mkdir  =  cgroup_mkdir , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. rmdir  =  cgroup_rmdir , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. rename  =  cgroup_rename , 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. setxattr  =  cgroup_setxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. getxattr  =  cgroup_getxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. listxattr  =  cgroup_listxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. removexattr  =  cgroup_removexattr , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-06-10 17:13:09 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  dentry  * cgroup_lookup ( struct  inode  * dir ,  struct  dentry  * dentry ,  unsigned  int  flags )  
						 
					
						
							
								
									
										
										
										
											2011-01-14 05:31:45 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( dentry - > d_name . len  >  NAME_MAX ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENAMETOOLONG ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									d_add ( dentry ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Check  if  a  file  is  a  control  file 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  struct  cftype  * __file_cft ( struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-01-23 17:07:38 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( file_inode ( file ) - > i_fop  ! =  & cgroup_file_operations ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  ERR_PTR ( - EINVAL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_create_file ( struct  dentry  * dentry ,  umode_t  mode ,  
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:20 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												struct  super_block  * sb ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( dentry - > d_inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EEXIST ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode  =  cgroup_new_inode ( mode ,  sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( mode ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_op  =  & cgroup_dir_inode_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_fop  =  & simple_dir_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* start off with i_nlink == 2 (for "." entry) */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inc_nlink ( inode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inc_nlink ( dentry - > d_parent - > d_inode ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Control  reaches  here  with  cgroup_mutex  held . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  @ inode - > i_mutex  should  nest  outside  cgroup_mutex  but  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  want  to  populate  it  immediately  without  releasing 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  cgroup_mutex .   As  @ inode  isn ' t  visible  to  anyone  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  yet ,  trylock  will  always  succeed  without  affecting 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  lockdep  checks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										WARN_ON_ONCE ( ! mutex_trylock ( & inode - > i_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									}  else  if  ( S_ISREG ( mode ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_size  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_fop  =  & cgroup_file_operations ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_op  =  & cgroup_file_inode_operations ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									d_instantiate ( dentry ,  inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dget ( dentry ) ; 	/* Extra count - pin the dentry in core */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_file_mode  -  deduce  file  mode  of  a  control  file 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cft :  the  control  file  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  cft - > mode  if  - > mode  is  not  0 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IRUGO | S_IWUSR  if  it  has  both  a  read  and  a  write  handler 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IRUGO  if  it  has  only  a  read  handler 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IWUSR  if  it  has  only  a  write  hander 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  umode_t  cgroup_file_mode ( const  struct  cftype  * cft )  
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									umode_t  mode  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > mode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cft - > mode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > read  | |  cft - > read_u64  | |  cft - > read_s64  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    cft - > read_map  | |  cft - > read_seq_string ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mode  | =  S_IRUGO ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > write  | |  cft - > write_u64  | |  cft - > write_s64  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    cft - > write_string  | |  cft - > trigger ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mode  | =  S_IWUSR ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  mode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_add_file ( struct  cgroup  * cgrp ,  struct  cgroup_subsys  * subsys ,  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											   struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * dir  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * parent  =  __d_cgrp ( dir ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  dentry  * dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  error ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									umode_t  mode ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									char  name [ MAX_CGROUP_TYPE_NAMELEN  +  MAX_CFTYPE_NAME  +  2 ]  =  {  0  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									simple_xattrs_init ( & cft - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( subsys  & &  ! test_bit ( ROOT_NOPREFIX ,  & cgrp - > root - > flags ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										strcpy ( name ,  subsys - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcat ( name ,  " . " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									strcat ( name ,  cft - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & dir - > d_inode - > i_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfe  =  kzalloc ( sizeof ( * cfe ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cfe ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									dentry  =  lookup_one_len ( name ,  dir ,  strlen ( name ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( dentry ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										error  =  PTR_ERR ( dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mode  =  cgroup_file_mode ( cft ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									error  =  cgroup_create_file ( dentry ,  mode  |  S_IFREG ,  cgrp - > root - > sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! error )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cfe - > type  =  ( void  * ) cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cfe - > dentry  =  dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dentry - > d_fsdata  =  cfe ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_add_tail ( & cfe - > node ,  & parent - > files ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cfe  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( cfe ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  error ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_addrm_files ( struct  cgroup  * cgrp ,  struct  cgroup_subsys  * subsys ,  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											      struct  cftype  cfts [ ] ,  bool  is_add ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  err ,  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( cft  =  cfts ;  cft - > name [ 0 ]  ! =  ' \0 ' ;  cft + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-12-06 14:38:57 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* does cft->flags tell us to skip this file on @cgrp? */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ( cft - > flags  &  CFTYPE_NOT_ON_ROOT )  & &  ! cgrp - > parent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ( cft - > flags  &  CFTYPE_ONLY_ON_ROOT )  & &  cgrp - > parent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( is_add )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											err  =  cgroup_add_file ( cgrp ,  subsys ,  cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												pr_warn ( " cgroup_addrm_files: failed to add %s, err=%d \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													cft - > name ,  err ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  err ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_rm_file ( cgrp ,  cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_MUTEX ( cgroup_cft_mutex ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_cfts_prepare ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									__acquires ( & cgroup_cft_mutex )  __acquires ( & cgroup_mutex ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Thanks  to  the  entanglement  with  vfs  inode  locking ,  we  can ' t  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  existing  cgroups  under  cgroup_mutex  and  create  files . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Instead ,  we  increment  reference  on  all  cgroups  and  build  list  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  them  using  @ cgrp - > cft_q_node .   Grab  cgroup_cft_mutex  to  ensure 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  exclusive  access  to  the  field . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_cft_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_cfts_commit ( struct  cgroup_subsys  * ss ,  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											       struct  cftype  * cfts ,  bool  is_add ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__releases ( & cgroup_mutex )  __releases ( & cgroup_cft_mutex ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									LIST_HEAD ( pending ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ,  * n ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* %NULL @cfts indicates abort and don't bother if @ss isn't attached */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cfts  & &  ss - > root  ! =  & rootnode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( cgrp ,  & ss - > root - > allcg_list ,  allcg_node )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											dget ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_add_tail ( & cgrp - > cft_q_node ,  & pending ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  All  new  cgroups  will  see  @ cfts  update  on  @ ss - > cftsets .   Add / rm 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  files  for  all  cgroups  which  were  created  before . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_safe ( cgrp ,  n ,  & pending ,  cft_q_node )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  inode  * inode  =  cgrp - > dentry - > d_inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! cgroup_is_removed ( cgrp ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_addrm_files ( cgrp ,  ss ,  cfts ,  is_add ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del_init ( & cgrp - > cft_q_node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dput ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_cft_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_add_cftypes  -  add  an  array  of  cftypes  to  a  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  target  cgroup  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cfts :  zero - length  name  terminated  array  of  cftypes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Register  @ cfts  to  @ ss .   Files  described  by  @ cfts  are  created  for  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  existing  cgroups  to  which  @ ss  is  attached  and  all  future  cgroups  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  have  them  too .   This  function  can  be  called  anytime  whether  @ ss  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  attached  or  not . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  0  on  successful  registration ,  - errno  on  failure .   Note  that  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  currently  returns  0  as  long  as  @ cfts  registration  is  successful 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  even  if  some  file  creation  attempts  on  existing  cgroups  fail . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_add_cftypes ( struct  cgroup_subsys  * ss ,  struct  cftype  * cfts )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype_set  * set ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									set  =  kzalloc ( sizeof ( * set ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! set ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_cfts_prepare ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									set - > cfts  =  cfts ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail ( & set - > node ,  & ss - > cftsets ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_cfts_commit ( ss ,  cfts ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_add_cftypes ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_rm_cftypes  -  remove  an  array  of  cftypes  from  a  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  target  cgroup  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cfts :  zero - length  name  terminated  array  of  cftypes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Unregister  @ cfts  from  @ ss .   Files  described  by  @ cfts  are  removed  from 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  all  existing  cgroups  to  which  @ ss  is  attached  and  all  future  cgroups 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  won ' t  have  them  either .   This  function  can  be  called  anytime  whether  @ ss 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  attached  or  not . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  0  on  successful  unregistration ,  - ENOENT  if  @ cfts  is  not 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  registered  with  @ ss . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_rm_cftypes ( struct  cgroup_subsys  * ss ,  struct  cftype  * cfts )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype_set  * set ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_cfts_prepare ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( set ,  & ss - > cftsets ,  node )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( set - > cfts  = =  cfts )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_del_init ( & set - > node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_cfts_commit ( ss ,  cfts ,  false ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_cfts_commit ( ss ,  NULL ,  false ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_task_count  -  count  the  number  of  tasks  in  a  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  number  of  tasks  in  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_task_count ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  count  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cgrp - > css_sets ,  cgrp_link_list )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										count  + =  atomic_read ( & link - > cg - > refcount ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Advance  a  list_head  iterator .   The  iterator  should  be  positioned  at 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  start  of  a  css_set 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_advance_iter ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												struct  cgroup_iter  * it ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l  =  it - > cg_link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Advance to the next non-empty css_set */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l  =  l - > next ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( l  = =  & cgrp - > css_sets )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											it - > cg_link  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										link  =  list_entry ( l ,  struct  cg_cgroup_link ,  cgrp_link_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cg  =  link - > cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( list_empty ( & cg - > tasks ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									it - > cg_link  =  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									it - > task  =  cg - > tasks . next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  To  reduce  the  fork ( )  overhead  for  systems  that  are  not  actually 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  using  their  cgroups  capability ,  we  don ' t  maintain  the  lists  running 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  through  each  css_set  to  its  tasks  until  we  see  the  list  actually 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  used  -  in  other  words  after  the  first  call  to  cgroup_iter_start ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_enable_task_cg_lists ( void )  
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * p ,  * g ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									use_task_css_set_links  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  need  tasklist_lock  because  RCU  is  not  safe  against 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  while_each_thread ( ) .  Besides ,  a  forking  task  that  has  passed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_post_fork ( )  without  seeing  use_task_css_set_links  =  1 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  not  guaranteed  to  have  its  child  immediately  visible  in  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  tasklist  if  we  walk  through  it  with  RCU . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & tasklist_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									do_each_thread ( g ,  p )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										task_lock ( p ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-17 11:37:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  should  check  if  the  process  is  exiting ,  otherwise 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  it  will  race  with  cgroup_exit ( )  in  that  the  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  entry  won ' t  be  deleted  though  the  process  has  exited . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ( p - > flags  &  PF_EXITING )  & &  list_empty ( & p - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_add ( & p - > cg_list ,  & p - > cgroups - > tasks ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										task_unlock ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while_each_thread ( g ,  p ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & tasklist_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_next_descendant_pre  -  find  the  next  descendant  for  pre - order  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos :  the  current  position  ( % NULL  to  initiate  traversal ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgroup :  cgroup  whose  descendants  to  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  To  be  used  by  cgroup_for_each_descendant_pre ( ) .   Find  the  next 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  descendant  to  visit  for  pre - order  traversal  of  @ cgroup ' s  descendants . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup  * cgroup_next_descendant_pre ( struct  cgroup  * pos ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													  struct  cgroup  * cgroup ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* if first iteration, pretend we just visited @cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! pos )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( list_empty ( & cgroup - > children ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pos  =  cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* visit the first child if exists */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									next  =  list_first_or_null_rcu ( & pos - > children ,  struct  cgroup ,  sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( next ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* no child, visit my or the closest ancestor's next sibling */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										next  =  list_entry_rcu ( pos - > sibling . next ,  struct  cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												      sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( & next - > sibling  ! =  & pos - > parent - > children ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pos  =  pos - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( pos  ! =  cgroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_next_descendant_pre ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_rightmost_descendant  -  return  the  rightmost  descendant  of  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos :  cgroup  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  rightmost  descendant  of  @ pos .   If  there ' s  no  descendant , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos  is  returned .   This  can  be  used  during  pre - order  traversal  to  skip 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subtree  of  @ pos . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup  * cgroup_rightmost_descendant ( struct  cgroup  * pos )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * last ,  * tmp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										last  =  pos ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* ->prev isn't RCU safe, walk ->next till the end */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pos  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry_rcu ( tmp ,  & last - > children ,  sibling ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pos  =  tmp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( pos ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  last ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_rightmost_descendant ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup  * cgroup_leftmost_descendant ( struct  cgroup  * pos )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * last ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										last  =  pos ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pos  =  list_first_or_null_rcu ( & pos - > children ,  struct  cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													     sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( pos ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  last ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_next_descendant_post  -  find  the  next  descendant  for  post - order  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos :  the  current  position  ( % NULL  to  initiate  traversal ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgroup :  cgroup  whose  descendants  to  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  To  be  used  by  cgroup_for_each_descendant_post ( ) .   Find  the  next 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  descendant  to  visit  for  post - order  traversal  of  @ cgroup ' s  descendants . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup  * cgroup_next_descendant_post ( struct  cgroup  * pos ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													   struct  cgroup  * cgroup ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* if first iteration, visit the leftmost descendant */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! pos )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										next  =  cgroup_leftmost_descendant ( cgroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  next  ! =  cgroup  ?  next  :  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* if there's an unvisited sibling, visit its leftmost descendant */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									next  =  list_entry_rcu ( pos - > sibling . next ,  struct  cgroup ,  sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( & next - > sibling  ! =  & pos - > parent - > children ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_leftmost_descendant ( next ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* no sibling left, visit parent */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									next  =  pos - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  next  ! =  cgroup  ?  next  :  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_next_descendant_post ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  cgroup_iter_start ( struct  cgroup  * cgrp ,  struct  cgroup_iter  * it )  
						 
					
						
							
								
									
										
										
										
											2011-12-27 07:46:26 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__acquires ( css_set_lock ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  The  first  time  anyone  tries  to  iterate  across  a  cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  need  to  enable  the  list  linking  each  css_set  to  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  tasks ,  and  fix  up  all  existing  tasks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! use_task_css_set_links ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_enable_task_cg_lists ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									it - > cg_link  =  & cgrp - > css_sets ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_advance_iter ( cgrp ,  it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  task_struct  * cgroup_iter_next ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													struct  cgroup_iter  * it ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l  =  it - > task ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If the iterator cg is NULL, we have no tasks */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! it - > cg_link ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									res  =  list_entry ( l ,  struct  task_struct ,  cg_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Advance iterator to find next entry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l  =  l - > next ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									link  =  list_entry ( it - > cg_link ,  struct  cg_cgroup_link ,  cgrp_link_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( l  = =  & link - > cg - > tasks )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* We reached the end of this task list - move on to
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  the  next  cg_cgroup_link  */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_advance_iter ( cgrp ,  it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										it - > task  =  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  cgroup_iter_end ( struct  cgroup  * cgrp ,  struct  cgroup_iter  * it )  
						 
					
						
							
								
									
										
										
										
											2011-12-27 07:46:26 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__releases ( css_set_lock ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  int  started_after_time ( struct  task_struct  * t1 ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  timespec  * time , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  task_struct  * t2 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  start_diff  =  timespec_compare ( & t1 - > start_time ,  time ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( start_diff  >  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  if  ( start_diff  <  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Arbitrarily ,  if  two  processes  started  at  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  time ,  we ' ll  say  that  the  lower  pointer  value 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  started  first .  Note  that  t2  may  have  exited  by  now 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  so  this  may  not  be  a  valid  pointer  any  longer ,  but 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  fine  -  it  still  serves  to  distinguish 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  between  two  tasks  started  ( effectively )  simultaneously . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  t1  >  t2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  is  a  callback  from  heap_insert ( )  and  is  used  to  order 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  heap . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  In  this  case  we  order  the  heap  in  descending  task  start  time . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  int  started_after ( void  * p1 ,  void  * p2 )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * t1  =  p1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * t2  =  p2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  started_after_time ( t1 ,  & t2 - > start_time ,  t2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_scan_tasks  -  iterate  though  all  the  tasks  in  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ scan :  struct  cgroup_scanner  containing  arguments  for  the  scan 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Arguments  include  pointers  to  callback  functions  test_task ( )  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  process_task ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Iterate  through  all  the  tasks  in  a  cgroup ,  calling  test_task ( )  for  each , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  and  if  it  returns  true ,  call  process_task ( )  for  it  also . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  test_task  pointer  may  be  NULL ,  meaning  always  true  ( select  all  tasks ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Effectively  duplicates  cgroup_iter_ { start , next , end } ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  but  does  not  lock  css_set_lock  for  the  call  to  process_task ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  struct  cgroup_scanner  may  be  embedded  in  any  structure  of  the  caller ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  creation . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  It  is  guaranteed  that  process_task ( )  will  act  on  every  task  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  a  member  of  the  cgroup  for  the  duration  of  this  call .  This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  may  or  may  not  call  process_task ( )  for  tasks  that  exit 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  or  move  to  a  different  cgroup  during  the  call ,  or  are  forked  or 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  move  into  the  cgroup  during  the  call . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Note  that  test_task ( )  may  be  called  with  locks  held ,  and  may  in  some 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  situations  be  called  multiple  times  for  the  same  task ,  so  it  should 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  be  cheap . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  If  the  heap  pointer  in  the  struct  cgroup_scanner  is  non - NULL ,  a  heap  has  been 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  pre - allocated  and  will  be  used  for  heap  operations  ( and  its  " gt "  member  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  be  overwritten ) ,  else  a  temporary  heap  will  be  used  ( allocation  of  which 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  may  cause  this  function  to  fail ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_scan_tasks ( struct  cgroup_scanner  * scan )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ,  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_iter  it ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * p ,  * dropped ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Never dereference latest_task, since it's not refcounted */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * latest_task  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  ptr_heap  tmp_heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  ptr_heap  * heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  timespec  latest_time  =  {  0 ,  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( scan - > heap )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* The caller supplied our heap and pre-allocated its memory */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap  =  scan - > heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap - > gt  =  & started_after ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* We need to allocate our own heap memory */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap  =  & tmp_heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  heap_init ( heap ,  PAGE_SIZE ,  GFP_KERNEL ,  & started_after ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* cannot allocate the heap */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 again : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Scan  tasks  in  the  cgroup ,  using  the  scanner ' s  " test_task "  callback 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  to  determine  which  are  of  interest ,  and  using  the  scanner ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  " process_task "  callback  to  process  any  of  them  that  need  an  update . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Since  we  don ' t  want  to  hold  any  locks  during  the  task  updates , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  gather  tasks  to  be  processed  in  a  heap  structure . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  The  heap  is  sorted  by  descending  task  start  time . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  the  statically - sized  heap  fills  up ,  we  overflow  tasks  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  started  later ,  and  in  future  iterations  only  consider  tasks  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  started  after  the  latest  task  in  the  previous  pass .  This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  guarantees  forward  progress  and  that  we  don ' t  miss  any  tasks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									heap - > size  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_iter_start ( scan - > cg ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( p  =  cgroup_iter_next ( scan - > cg ,  & it ) ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Only  affect  tasks  that  qualify  per  the  caller ' s  callback , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  if  he  provided  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( scan - > test_task  & &  ! scan - > test_task ( p ,  scan ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Only  process  tasks  that  started  after  the  last  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  we  processed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! started_after_time ( p ,  & latest_time ,  latest_task ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dropped  =  heap_insert ( heap ,  p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( dropped  = =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  The  new  task  was  inserted ;  the  heap  wasn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  previously  full 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											get_task_struct ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  if  ( dropped  ! =  p )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  The  new  task  was  inserted ,  and  pushed  out  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  different  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											get_task_struct ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_task_struct ( dropped ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Else  the  new  task  was  newer  than  anything  already  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  the  heap  and  wasn ' t  inserted 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_iter_end ( scan - > cg ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( heap - > size )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  heap - > size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  task_struct  * q  =  heap - > ptrs [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( i  = =  0 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												latest_time  =  q - > start_time ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												latest_task  =  q ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Process the task per the caller's callback */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											scan - > process_task ( q ,  scan ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_task_struct ( q ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  If  we  had  to  process  any  tasks  at  all ,  scan  again 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  in  case  some  of  them  were  in  the  middle  of  forking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  children  that  didn ' t  get  processed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Not  the  most  efficient  way  to  do  it ,  but  it  avoids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  having  to  take  callback_mutex  in  the  fork  path 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  again ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( heap  = =  & tmp_heap ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap_free ( & tmp_heap ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Stuff  for  reading  the  ' tasks ' / ' procs '  files . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Reading  this  file  can  return  large  amounts  of  data  if  a  cgroup  has 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  * lots *  of  attached  tasks .  So  it  may  need  several  calls  to  read ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  but  we  cannot  guarantee  that  the  information  we  produce  is  correct 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  unless  we  produce  it  entirely  atomically . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-20 11:58:43 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* which pidlist file are we talking about? */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								enum  cgroup_filetype  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									CGROUP_FILE_PROCS , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									CGROUP_FILE_TASKS , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  pidlist  is  a  list  of  pids  that  virtually  represents  the  contents  of  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  of  the  cgroup  files  ( " procs "  or  " tasks " ) .  We  keep  a  list  of  such  pidlists , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  a  pair  ( one  each  for  procs ,  tasks )  for  each  pid  namespace  that ' s  relevant 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_pidlist  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  used  to  find  which  pidlist  is  wanted .  doesn ' t  change  as  long  as 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  particular  list  stays  in  the  list . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									*/ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  {  enum  cgroup_filetype  type ;  struct  pid_namespace  * ns ;  }  key ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* array of xids */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* how many elements the above list has */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* how many files are using the current array */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  use_count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* each of these stored in a list by its cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* pointer to the cgroup we belong to, for list removal purposes */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * owner ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* protects the other fields */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  rw_semaphore  mutex ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  following  two  functions  " fix "  the  issue  where  there  are  more  pids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  than  kmalloc  will  give  memory  for ;  in  such  cases ,  we  use  vmalloc / vfree . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  TODO :  replace  with  a  kernel - wide  solution  to  this  problem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2)) 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  * pidlist_allocate ( int  count )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( PIDLIST_TOO_LARGE ( count ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  vmalloc ( count  *  sizeof ( pid_t ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  kmalloc ( count  *  sizeof ( pid_t ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  pidlist_free ( void  * p )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( is_vmalloc_addr ( p ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										vfree ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  * pidlist_resize ( void  * p ,  int  newcount )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									void  * newlist ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* note: if new alloc fails, old p will still be valid either way */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( is_vmalloc_addr ( p ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										newlist  =  vmalloc ( newcount  *  sizeof ( pid_t ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! newlist ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										memcpy ( newlist ,  p ,  newcount  *  sizeof ( pid_t ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										vfree ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										newlist  =  krealloc ( p ,  newcount  *  sizeof ( pid_t ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  newlist ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  pidlist_uniq  -  given  a  kmalloc ( ) ed  list ,  strip  out  all  duplicate  entries 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  If  the  new  stripped  list  is  sufficiently  smaller  and  there ' s  enough  memory 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  allocate  a  new  buffer ,  will  let  go  of  the  unneeded  memory .  Returns  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  number  of  unique  elements . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* is the size difference enough that we should re-allocate the array? */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define PIDLIST_REALLOC_DIFFERENCE(old, new) ((old) - PAGE_SIZE >= (new)) 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  pidlist_uniq ( pid_t  * * p ,  int  length )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  src ,  dest  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * list  =  * p ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * newlist ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  presume  the  0 th  element  is  unique ,  so  i  starts  at  1.  trivial 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  edge  cases  first ;  no  work  needs  to  be  done  for  either 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( length  = =  0  | |  length  = =  1 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* src and dest walk down the list; dest counts unique elements */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( src  =  1 ;  src  <  length ;  src + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* find next unique element */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										while  ( list [ src ]  = =  list [ src - 1 ] )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											src + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( src  = =  length ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  after ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* dest always points to where the next unique element goes */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list [ dest ]  =  list [ src ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dest + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								after :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  if  the  length  difference  is  large  enough ,  we  want  to  allocate  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  smaller  buffer  to  save  memory .  if  this  fails  due  to  out  of  memory , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we ' ll  just  stay  with  what  we ' ve  got . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( PIDLIST_REALLOC_DIFFERENCE ( length ,  dest ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										newlist  =  pidlist_resize ( list ,  dest ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( newlist ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											* p  =  newlist ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dest ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cmppid ( const  void  * a ,  const  void  * b )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  * ( pid_t  * ) a  -  * ( pid_t  * ) b ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find  the  appropriate  pidlist  for  our  purpose  ( given  procs  vs  tasks ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  with  the  lock  on  that  pidlist  already  held ,  and  takes  care 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  of  the  use  count ,  or  returns  NULL  with  no  locks  held  if  we ' re  out  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  memory . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup_pidlist  * cgroup_pidlist_find ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														  enum  cgroup_filetype  type ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* don't need task_nsproxy() if we're looking at ourself */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-02 14:51:53 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  pid_namespace  * ns  =  task_active_pid_ns ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  can ' t  drop  the  pidlist_mutex  before  taking  the  l - > mutex  in  case 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  last  ref - holder  is  trying  to  remove  l  from  the  list  at  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  time .  Holding  the  pidlist_mutex  precludes  somebody  taking  whichever 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  list  we  find  out  from  under  us  -  compare  release_pid_array ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( l ,  & cgrp - > pidlists ,  links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( l - > key . type  = =  type  & &  l - > key . ns  = =  ns )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* make sure l doesn't vanish out from under us */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											down_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* entry not found; create a new one */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l  =  kmalloc ( sizeof ( struct  cgroup_pidlist ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! l )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_rwsem ( & l - > mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									down_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > key . type  =  type ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > key . ns  =  get_pid_ns ( ns ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > use_count  =  0 ;  /* don't increment here */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > list  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > owner  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & l - > links ,  & cgrp - > pidlists ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Load  a  cgroup ' s  pidarray  with  either  procs '  tgids  or  tasks '  pids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  pidlist_array_load ( struct  cgroup  * cgrp ,  enum  cgroup_filetype  type ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      struct  cgroup_pidlist  * * lp ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  pid ,  n  =  0 ;  /* used for populating the array */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_iter  it ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  cgroup  gets  more  users  after  we  read  count ,  we  won ' t  have 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  enough  space  -  tough .   This  race  is  indistinguishable  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  caller  from  the  case  that  the  additional  cgroup  users  didn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  show  up  until  sometime  later  on . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									length  =  cgroup_task_count ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									array  =  pidlist_allocate ( length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! array ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* now, populate the array */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_iter_start ( cgrp ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( tsk  =  cgroup_iter_next ( cgrp ,  & it ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( unlikely ( n  = =  length ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* get tgid or pid for procs or tasks file respectively */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( type  = =  CGROUP_FILE_PROCS ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pid  =  task_tgid_vnr ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pid  =  task_pid_vnr ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( pid  >  0 )  /* make sure to only use valid results */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											array [ n + + ]  =  pid ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_iter_end ( cgrp ,  & it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									length  =  n ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* now sort & (if procs) strip out duplicates */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sort ( array ,  length ,  sizeof ( pid_t ) ,  cmppid ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( type  = =  CGROUP_FILE_PROCS ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										length  =  pidlist_uniq ( & array ,  length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l  =  cgroup_pidlist_find ( cgrp ,  type ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! l )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pidlist_free ( array ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* store array, freeing old if necessary - lock already held */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									pidlist_free ( l - > list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > list  =  array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > length  =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > use_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									up_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									* lp  =  l ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroupstats_build  -  build  and  fill  cgroupstats 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ stats :  cgroupstats  to  fill  information  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dentry :  A  dentry  entry  belonging  to  the  cgroup  for  which  stats  have 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  been  requested . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Build  and  fill  cgroupstats  so  that  taskstats  can  export  it  to  user 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  space . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroupstats_build ( struct  cgroupstats  * stats ,  struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret  =  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_iter  it ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Validate  dentry  by  checking  the  superblock  operations , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  and  make  sure  it ' s  a  directory . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( dentry - > d_sb - > s_op  ! =  & cgroup_ops  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ! S_ISDIR ( dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 goto  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp  =  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_iter_start ( cgrp ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( tsk  =  cgroup_iter_next ( cgrp ,  & it ) ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										switch  ( tsk - > state )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_RUNNING : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_running + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_INTERRUPTIBLE : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_sleeping + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_UNINTERRUPTIBLE : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_uninterruptible + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_STOPPED : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_stopped + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										default : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( delayacct_is_task_waiting_on_io ( tsk ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												stats - > nr_io_wait + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_iter_end ( cgrp ,  & it ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  seq_file  methods  for  the  tasks / procs  files .  The  seq_file  position  is  the 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  next  pid  to  display ;  the  seq_file  iterator  is  a  pointer  to  the  pid 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  in  the  cgroup - > l - > list  array . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  * cgroup_pidlist_start ( struct  seq_file  * s ,  loff_t  * pos )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Initially  we  receive  a  position  value  that  corresponds  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  one  more  than  the  last  pid  shown  ( or  0  on  the  first  call  or 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  after  a  seek  to  the  start ) .  Use  a  binary - search  to  find  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  next  pid  to  display ,  if  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  index  =  0 ,  pid  =  * pos ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  * iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									down_read ( & l - > mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  end  =  l - > length ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-21 16:11:20 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										while  ( index  <  end )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											int  mid  =  ( index  +  end )  /  2 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( l - > list [ mid ]  = =  pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												index  =  mid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											}  else  if  ( l - > list [ mid ]  < =  pid ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												index  =  mid  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												end  =  mid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If we're off the end of the array, we're done */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( index  > =  l - > length ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Update the abstract position to be the actual pid that we found */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									iter  =  l - > list  +  index ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									* pos  =  * iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_pidlist_stop ( struct  seq_file  * s ,  void  * v )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									up_read ( & l - > mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  * cgroup_pidlist_next ( struct  seq_file  * s ,  void  * v ,  loff_t  * pos )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * p  =  v ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * end  =  l - > list  +  l - > length ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Advance  to  the  next  pid  in  the  array .  If  this  goes  off  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  end ,  we ' re  done 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									p + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( p  > =  end )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										* pos  =  * p ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  p ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_show ( struct  seq_file  * s ,  void  * v )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  seq_printf ( s ,  " %d \n " ,  * ( int  * ) v ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  seq_operations  functions  for  iterating  on  pidlists  through  seq_file  - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  independent  of  whether  it ' s  tasks  or  procs 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  const  struct  seq_operations  cgroup_pidlist_seq_operations  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. start  =  cgroup_pidlist_start , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. stop  =  cgroup_pidlist_stop , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. next  =  cgroup_pidlist_next , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. show  =  cgroup_pidlist_show , 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_release_pid_array ( struct  cgroup_pidlist  * l )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  case  where  we ' re  the  last  user  of  this  particular  pidlist  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  have  us  remove  it  from  the  cgroup ' s  list ,  which  entails  taking  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  mutex .  since  in  pidlist_find  the  pidlist - > lock  depends  on  cgroup - > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  pidlist_mutex ,  we  have  to  take  pidlist_mutex  first . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									down_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! l - > use_count ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! - - l - > use_count )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* we're the last user if refcount is 0; remove and free */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & l - > links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pidlist_free ( l - > list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										put_pid_ns ( l - > key . ns ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										up_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( l ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									up_write ( & l - > mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_release ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ( file - > f_mode  &  FMODE_READ ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  seq_file  will  only  be  initialized  if  the  file  was  opened  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  reading ;  hence  we  check  if  it ' s  not  null  only  in  that  case . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l  =  ( ( struct  seq_file  * ) file - > private_data ) - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_release_pid_array ( l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  seq_release ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_pidlist_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  The  following  functions  handle  opens  on  a  file  that  displays  a  pidlist 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  ( tasks  or  procs ) .  Prepare  an  array  of  the  process / thread  IDs  of  whoever ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  in  the  cgroup . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* helper function for the two below it */  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_open ( struct  file  * file ,  enum  cgroup_filetype  type )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( file - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Nothing to do for write-only files */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ( file - > f_mode  &  FMODE_READ ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* have the array populated */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  pidlist_array_load ( cgrp ,  type ,  & l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* configure file information */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									file - > f_op  =  & cgroup_pidlist_operations ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  seq_open ( file ,  & cgroup_pidlist_seq_operations ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_release_pid_array ( l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  retval ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									( ( struct  seq_file  * ) file - > private_data ) - > private  =  l ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_tasks_open ( struct  inode  * unused ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_pidlist_open ( file ,  CGROUP_FILE_TASKS ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_procs_open ( struct  inode  * unused ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_pidlist_open ( file ,  CGROUP_FILE_PROCS ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  cgroup_read_notify_on_release ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													    struct  cftype  * cft ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  notify_on_release ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_write_notify_on_release ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													  struct  cftype  * cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													  u64  val ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									clear_bit ( CGRP_RELEASABLE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( val ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										clear_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Unregister  event  and  free  resources . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Gets  called  from  workqueue . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_event_remove ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( work ,  struct  cgroup_event , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  event - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									remove_wait_queue ( event - > wqh ,  & event - > wait ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									event - > cft - > unregister_event ( cgrp ,  event - > cft ,  event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Notify userspace the event is going away. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									eventfd_signal ( event - > eventfd ,  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									eventfd_ctx_put ( event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( event ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dput ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Gets  called  on  POLLHUP  on  eventfd  when  user  closes  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Called  with  wqh - > lock  held  and  interrupts  disabled . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_event_wake ( wait_queue_t  * wait ,  unsigned  mode ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  sync ,  void  * key ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( wait , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_event ,  wait ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  event - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  long  flags  =  ( unsigned  long ) key ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( flags  &  POLLHUP )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  If  the  event  has  been  detached  at  cgroup  removal ,  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  can  simply  return  knowing  the  other  side  will  cleanup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  for  us . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  can ' t  race  against  event  freeing  since  the  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  side  will  require  wqh - > lock  via  remove_wait_queue ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  which  we  hold . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! list_empty ( & event - > list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_del_init ( & event - > list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  We  are  in  atomic  context ,  but  cgroup_event_remove ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  may  sleep ,  so  we  have  to  call  it  in  workqueue . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											schedule_work ( & event - > remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_event_ptable_queue_proc ( struct  file  * file ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										wait_queue_head_t  * wqh ,  poll_table  * pt ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( pt , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_event ,  pt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event - > wqh  =  wqh ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									add_wait_queue ( wqh ,  & event - > wait ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Parse  input  and  register  new  cgroup  event  handler . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Input  must  be  in  format  ' < event_fd >  < control_fd >  < args > ' . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Interpretation  of  args  is  defined  by  control  file  implementation . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_write_event_control ( struct  cgroup  * cgrp ,  struct  cftype  * cft ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												      const  char  * buffer ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 14:13:35 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp_cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  int  efd ,  cfd ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  file  * efile  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  file  * cfile  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * endp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									efd  =  simple_strtoul ( buffer ,  & endp ,  10 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( * endp  ! =  '   ' ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer  =  endp  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfd  =  simple_strtoul ( buffer ,  & endp ,  10 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ( * endp  ! =  '   ' )  & &  ( * endp  ! =  ' \0 ' ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer  =  endp  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event  =  kzalloc ( sizeof ( * event ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! event ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event - > cgrp  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & event - > list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_poll_funcptr ( & event - > pt ,  cgroup_event_ptable_queue_proc ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_waitqueue_func_entry ( & event - > wait ,  cgroup_event_wake ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_WORK ( & event - > remove ,  cgroup_event_remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									efile  =  eventfd_fget ( efd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( efile ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( efile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event - > eventfd  =  eventfd_ctx_fileget ( efile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( event - > eventfd ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfile  =  fget ( cfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cfile )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBADF ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* the process need read permission on control file */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-06-19 12:55:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* AV: shouldn't we check that it's been opened for read instead? */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-23 17:07:38 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  inode_permission ( file_inode ( cfile ) ,  MAY_READ ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event - > cft  =  __file_cft ( cfile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( event - > cft ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( event - > cft ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 14:13:35 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  The  file  to  be  monitored  must  be  in  the  same  cgroup  as 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup . event_control  is . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp_cfile  =  __d_cgrp ( cfile - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgrp_cfile  ! =  cgrp )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! event - > cft - > register_event  | |  ! event - > cft - > unregister_event )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  event - > cft - > register_event ( cgrp ,  event - > cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											event - > eventfd ,  buffer ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( efile - > f_op - > poll ( efile ,  & event - > pt )  &  POLLHUP )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										event - > cft - > unregister_event ( cgrp ,  event - > cft ,  event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  fail ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Events  should  be  removed  after  rmdir  of  cgroup  directory ,  but  before 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  destroying  subsystem  state  objects .  Let ' s  take  reference  to  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  directory  dentry  to  do  that . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dget ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & event - > list ,  & cgrp - > event_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									fput ( cfile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									fput ( efile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								fail :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cfile ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										fput ( cfile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( event  & &  event - > eventfd  & &  ! IS_ERR ( event - > eventfd ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										eventfd_ctx_put ( event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! IS_ERR_OR_NULL ( efile ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										fput ( efile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( event ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  cgroup_clone_children_read ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												    struct  cftype  * cft ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_clone_children_write ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  cftype  * cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     u64  val ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( val ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										clear_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for  the  common  functions ,  ' private '  gives  the  type  of  file 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* for hysterical raisins, we can't put this on the older files */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define CGROUP_FILE_GENERIC_PREFIX "cgroup." 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cftype  files [ ]  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " tasks " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. open  =  cgroup_tasks_open , 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. write_u64  =  cgroup_tasks_write , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. mode  =  S_IRUGO  |  S_IWUSR , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  CGROUP_FILE_GENERIC_PREFIX  " procs " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. open  =  cgroup_procs_open , 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. write_u64  =  cgroup_procs_write , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. mode  =  S_IRUGO  |  S_IWUSR , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " notify_on_release " , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. read_u64  =  cgroup_read_notify_on_release , 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. write_u64  =  cgroup_write_notify_on_release , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  CGROUP_FILE_GENERIC_PREFIX  " event_control " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_string  =  cgroup_write_event_control , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. mode  =  S_IWUGO , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup.clone_children " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  cgroup_clone_children_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_u64  =  cgroup_clone_children_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " release_agent " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. flags  =  CFTYPE_ONLY_ON_ROOT , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  cgroup_release_agent_show , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_string  =  cgroup_release_agent_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. max_write_len  =  PATH_MAX , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{  } 	/* terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_populate_dir  -  selectively  creation  of  files  in  a  directory 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  target  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ base_files :  true  if  the  base  files  should  be  added 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ subsys_mask :  mask  of  the  subsystem  ids  whose  files  should  be  added 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_populate_dir ( struct  cgroup  * cgrp ,  bool  base_files ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       unsigned  long  subsys_mask ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( base_files )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  cgroup_addrm_files ( cgrp ,  NULL ,  files ,  true ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( err  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* process cftsets of each subsystem */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cftype_set  * set ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! test_bit ( ss - > subsys_id ,  & subsys_mask ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_for_each_entry ( set ,  & ss - > cftsets ,  node ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_addrm_files ( cgrp ,  ss ,  set - > cfts ,  true ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* This cgroup is ready now */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  cgrp - > subsys [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Update  id - > css  pointer  and  make  this  css  visible  from 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  CSS  ID  functions .  This  pointer  will  be  dereferened 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  from  RCU - read - side  without  locks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( css - > id ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											rcu_assign_pointer ( css - > id - > css ,  css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  css_dput_fn ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										container_of ( work ,  struct  cgroup_subsys_state ,  dput_work ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: fix cgroup hierarchy umount race
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.
Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.
Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().
This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().
Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.
Fix it by holding an extra superblock->s_active reference across
dput() from css release, which is the dput() path added by 48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4FEEA5CB.8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Tested-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2012-07-07 16:08:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * dentry  =  css - > cgroup - > dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb  =  dentry - > d_sb ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: fix cgroup hierarchy umount race
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.
Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.
Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().
This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().
Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.
Fix it by holding an extra superblock->s_active reference across
dput() from css release, which is the dput() path added by 48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4FEEA5CB.8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Tested-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2012-07-07 16:08:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_inc ( & sb - > s_active ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  init_cgroup_css ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       struct  cgroup_subsys  * ss , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											       struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:08:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_set ( & css - > refcnt ,  1 ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									css - > flags  =  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > id  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgrp  = =  dummytop ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css - > flags  | =  CSS_ROOT ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( cgrp - > subsys [ ss - > subsys_id ] ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > subsys [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css  holds  an  extra  ref  to  @ cgrp - > dentry  which  is  put  on  the  last 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_put ( ) .   dput ( )  requires  process  context ,  which  css_put ( )  may 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  be  called  without .   @ css - > dput_work  will  be  used  to  invoke 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  dput ( )  asynchronously  from  css_put ( ) . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_WORK ( & css - > dput_work ,  css_dput_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* invoke ->post_create() on a new CSS and mark it online if successful */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  online_css ( struct  cgroup_subsys  * ss ,  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss - > css_online ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  ss - > css_online ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp - > subsys [ ss - > subsys_id ] - > flags  | =  CSS_ONLINE ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* if the CSS is online, invoke ->pre_destory() on it and mark it offline */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  offline_css ( struct  cgroup_subsys  * ss ,  struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									__releases ( & cgroup_mutex )  __acquires ( & cgroup_mutex ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  cgrp - > subsys [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! ( css - > flags  &  CSS_ONLINE ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css_offline ( )  should  be  called  with  cgroup_mutex  unlocked .   See 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  3f a59dfbc3  ( " cgroup: fix potential deadlock in pre_destroy " )  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  details .   This  temporary  unlocking  should  go  away  once 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_mutex  is  unexported  from  controllers . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss - > css_offline )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ss - > css_offline ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > subsys [ ss - > subsys_id ] - > flags  & =  ~ CSS_ONLINE ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_create  -  create  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ parent :  cgroup  that  will  be  parent  of  the  new  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dentry :  dentry  of  the  new  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ mode :  mode  to  set  on  new  inode 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Must  be  called  with  the  mutex  on  the  parent  inode  held 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  long  cgroup_create ( struct  cgroup  * parent ,  struct  dentry  * dentry ,  
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											     umode_t  mode ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  parent - > root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  err  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb  =  root - > sb ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* allocate the cgroup and its ID, 0 is reserved for the root */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp  =  kzalloc ( sizeof ( * cgrp ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgrp ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > id  =  ida_simple_get ( & root - > cgroup_ida ,  1 ,  0 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgrp - > id  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_free_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Only  live  parents  can  have  children .   Note  that  the  liveliness 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  check  isn ' t  strictly  necessary  because  cgroup_mkdir ( )  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_rmdir ( )  are  fully  synchronized  by  i_mutex ;  however ,  do  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  anyway  so  that  locking  is  contained  inside  cgroup  proper  and  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  don ' t  get  nasty  surprises  if  we  ever  grow  another  caller . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( parent ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_free_id ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Grab a reference on the superblock so the hierarchy doesn't
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  get  deleted  on  unmount  if  there  are  child  cgroups .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  can  be  done  outside  cgroup_mutex ,  since  the  sb  can ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  disappear  while  someone  has  an  open  control  file  on  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  fs  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									atomic_inc ( & sb - > s_active ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_housekeeping ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:30:22 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dentry - > d_fsdata  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > dentry  =  dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > parent  =  parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > root  =  parent - > root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > top_cgroup  =  parent - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-03-04 14:28:19 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( notify_on_release ( parent ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & parent - > flags ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 12:20:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-02-02 13:44:10 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css  =  ss - > css_alloc ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( IS_ERR ( css ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											err  =  PTR_ERR ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  err_free_all ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										init_cgroup_css ( css ,  ss ,  cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-02-02 13:44:10 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > use_id )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											err  =  alloc_css_id ( ss ,  parent ,  cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( err ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												goto  err_free_all ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-02-02 13:44:10 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Create  directory .   cgroup_create_file ( )  returns  with  the  new 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  directory  locked  on  success  so  that  it  can  be  populated  without 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  dropping  cgroup_mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									err  =  cgroup_create_file ( dentry ,  S_IFDIR  |  mode ,  sb ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( err  <  0 ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_free_all ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* allocation complete, commit to creation */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail ( & cgrp - > allcg_node ,  & root - > allcg_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail_rcu ( & cgrp - > sibling ,  & cgrp - > parent - > children ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root - > number_of_cgroups + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* each css holds a ref to the cgroup's dentry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										dget ( dentry ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* creation succeeded, notify subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  online_css ( ss ,  cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  err_destroy ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-30 17:31:23 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > broken_hierarchy  & &  ! ss - > warned_broken_hierarchy  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    parent - > parent )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pr_warning ( " cgroup: %s (%d) created nested cgroup for controller  \" %s \"  which has incomplete hierarchy support. Nested cgroups may change behavior in the future. \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   current - > comm ,  current - > pid ,  ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! strcmp ( ss - > name ,  " memory " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												pr_warning ( " cgroup:  \" memory \"  requires setting use_hierarchy to 1 on the root. \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ss - > warned_broken_hierarchy  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									err  =  cgroup_populate_dir ( cgrp ,  true ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_destroy ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_all :  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( cgrp - > subsys [ ss - > subsys_id ] ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > css_free ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Release the reference count that we took on the superblock */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_id :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ida_simple_remove ( & root - > cgroup_ida ,  cgrp - > id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_cgrp :  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err_destroy :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_destroy_locked ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:41:39 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_mkdir ( struct  inode  * dir ,  struct  dentry  * dentry ,  umode_t  mode )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * c_parent  =  dentry - > d_parent - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* the vfs holds inode->i_mutex already */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  cgroup_create ( c_parent ,  dentry ,  mode  |  S_IFDIR ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Check  the  reference  count  on  each  subsystem .  Since  we  already 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  established  that  there  are  no  tasks  in  the  cgroup ,  if  the  css  refcount 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  also  1 ,  then  there  should  be  no  outstanding  references ,  so  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subsystem  is  safe  to  destroy .  We  scan  across  all  subsystems  rather  than 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  using  the  per - hierarchy  linked  list  of  mounted  subsystems  since  we  can 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  be  called  via  check_for_release ( )  with  no  synchronization  other  than 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  RCU ,  and  the  subsystem  linked  list  isn ' t  RCU - safe . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_has_css_refs ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  won ' t  need  to  lock  the  subsys  array ,  because  the  subsystems 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we ' re  concerned  about  aren ' t  going  anywhere  since  our  cgroup  root 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  has  a  reference  on  them . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Skip subsystems not present or not in this hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss  = =  NULL  | |  ss - > root  ! =  cgrp - > root ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css  =  cgrp - > subsys [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  When  called  from  check_for_release ( )  it ' s  possible 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  that  by  this  point  the  cgroup  has  been  removed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  and  the  css  deleted .  But  a  false - positive  doesn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  matter ,  since  it  can  only  happen  if  the  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  has  been  deleted  and  hence  no  longer  needs  the 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  release  agent  to  be  called  anyway . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( css  & &  css_refcnt ( css )  >  1 ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_destroy_locked ( struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									__releases ( & cgroup_mutex )  __acquires ( & cgroup_mutex ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * d  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * parent  =  cgrp - > parent ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									DEFINE_WAIT ( wait ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_event  * event ,  * tmp ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-28 13:50:44 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									LIST_HEAD ( tmp_list ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & d - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( atomic_read ( & cgrp - > count )  | |  ! list_empty ( & cgrp - > children ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - EBUSY ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Block  new  css_tryget ( )  by  deactivating  refcnt  and  mark  @ cgrp 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  removed .   This  makes  future  css_tryget ( )  and  child  creation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  attempts  fail  thus  maintaining  the  removal  conditions  verified 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  above . 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  cgrp - > subsys [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										WARN_ON ( atomic_read ( & css - > refcnt )  <  0 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										atomic_add ( CSS_DEACT_BIAS ,  & css - > refcnt ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									set_bit ( CGRP_REMOVED ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* tell subsystems to initate destruction */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										offline_css ( ss ,  cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Put  all  the  base  refs .   Each  css  holds  an  extra  reference  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup ' s  dentry  and  cgroup  removal  proceeds  regardless  of  css 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  refs .   On  the  last  put  of  each  css ,  whenever  that  may  be ,  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  extra  dentry  ref  is  put  so  that  dentry  destruction  happens  only 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  after  all  css ' s  are  released . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( cgrp - > root ,  ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										css_put ( cgrp - > subsys [ ss - > subsys_id ] ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! list_empty ( & cgrp - > release_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-22 16:30:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:08:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* delete this cgroup from parent->children */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_del_rcu ( & cgrp - > sibling ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_del_init ( & cgrp - > allcg_node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dget ( d ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									cgroup_d_remove_dir ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									set_bit ( CGRP_RELEASABLE ,  & parent - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									check_for_release ( parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Unregister  events  and  notify  userspace . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Notify  userspace  about  cgroup  removing  only  after  rmdir  of  cgroup 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  directory  to  avoid  race  between  userspace  and  kernelspace . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry_safe ( event ,  tmp ,  & cgrp - > event_list ,  list )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-28 13:50:45 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & event - > list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										schedule_work ( & event - > remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_rmdir ( struct  inode  * unused_dir ,  struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  cgroup_destroy_locked ( dentry - > d_fsdata ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __init_or_module  cgroup_init_cftsets ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & ss - > cftsets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  base_cftset  is  embedded  in  subsys  itself ,  no  need  to  worry  about 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  deregistration . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > base_cftypes )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ss - > base_cftset . cfts  =  ss - > base_cftypes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_add_tail ( & ss - > base_cftset . node ,  & ss - > cftsets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __init  cgroup_init_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-11-14 16:58:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									printk ( KERN_INFO  " Initializing cgroup subsys %s \n " ,  ss - > name ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* init base cftset */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_init_cftsets ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Create the top cgroup state for this subsystem */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add ( & ss - > sibling ,  & rootnode . subsys_list ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									ss - > root  =  & rootnode ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css  =  ss - > css_alloc ( dummytop ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* We don't handle early failures gracefully */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( IS_ERR ( css ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_cgroup_css ( css ,  ss ,  dummytop ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Update the init_css_set to contain a subsys
 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  pointer  to  this  state  -  since  the  subsystem  is 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  newly  registered ,  all  tasks  and  hence  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  init_css_set  is  in  the  subsystem ' s  top  cgroup .  */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_css_set . subsys [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									need_forkexit_callback  | =  ss - > fork  | |  ss - > exit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* At system boot, before all subsystems have been
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  registered ,  no  tasks  have  been  forked ,  so  we  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  need  to  invoke  fork  callbacks  here .  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & init_task . tasks ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									ss - > active  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( online_css ( ss ,  dummytop ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* this function shouldn't be used with modular subsystems, since they
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  need  to  register  a  subsys_id ,  among  other  things  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ss - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_load_subsys :  load  and  register  a  modular  subsystem  at  runtime 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  subsystem  to  load 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  should  be  called  in  a  modular  subsystem ' s  initcall .  If  the 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-16 11:47:56 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  subsystem  is  built  as  a  module ,  it  will  be  assigned  a  new  subsys_id  and  set 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  up  for  use .  If  the  subsystem  is  built - in  anyway ,  work  is  delegated  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  simpler  cgroup_init_subsys . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init_or_module  cgroup_load_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ,  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  hlist_node  * tmp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* check name and function validity */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > name  = =  NULL  | |  strlen ( ss - > name )  >  MAX_CGROUP_TYPE_NAMELEN  | | 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    ss - > css_alloc  = =  NULL  | |  ss - > css_free  = =  NULL ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  don ' t  support  callbacks  in  modular  subsystems .  this  check  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  before  the  ss - > module  check  for  consistency ;  a  subsystem  that  could 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  be  a  module  should  still  have  no  callbacks  even  if  the  user  isn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  compiling  it  as  one . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > fork  | |  ss - > exit ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  an  optionally  modular  subsystem  is  built - in :  we  want  to  do  nothing , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  since  cgroup_init_subsys  will  have  already  taken  care  of  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > module  = =  NULL )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* a sanity check */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( subsys [ ss - > subsys_id ]  ! =  ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* init base cftset */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_init_cftsets ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: Assign subsystem IDs during compile time
WARNING: With this change it is impossible to load external built
controllers anymore.
In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.
By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.
A close look is necessary on the RCU part which was introduces by
following patch:
  commit f845172531fb7410c7fb7780b1a6e51ee6df7d52
  Author:	Herbert Xu <herbert@gondor.apana.org.au>  Mon May 24 09:12:34 2010
  Committer:	David S. Miller <davem@davemloft.net>  Mon May 24 09:12:34 2010
  cls_cgroup: Store classid in struct sock
  Tis code was added to init_cgroup_cls()
	  /* We can't use rcu_assign_pointer because this is an int. */
	  smp_wmb();
	  net_cls_subsys_id = net_cls_subsys.subsys_id;
  respectively to exit_cgroup_cls()
	  net_cls_subsys_id = -1;
	  synchronize_rcu();
  and in module version of task_cls_classid()
	  rcu_read_lock();
	  id = rcu_dereference(net_cls_subsys_id);
	  if (id >= 0)
		  classid = container_of(task_subsys_state(p, id),
					 struct cgroup_cls_state, css)->classid;
	  rcu_read_unlock();
Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)
So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.
The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).
So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.
Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.
The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).
No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.
When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
											 
										 
										
											2012-09-12 16:12:07 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									subsys [ ss - > subsys_id ]  =  ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  no  ss - > css_alloc  seems  to  need  anything  important  in  the  ss 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  struct ,  so  this  can  happen  first  ( i . e .  before  the  rootnode 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  attachment ) . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css  =  ss - > css_alloc ( dummytop ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( css ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* failure case - need to deassign the subsys[] slot. */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: Assign subsystem IDs during compile time
WARNING: With this change it is impossible to load external built
controllers anymore.
In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.
By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.
A close look is necessary on the RCU part which was introduces by
following patch:
  commit f845172531fb7410c7fb7780b1a6e51ee6df7d52
  Author:	Herbert Xu <herbert@gondor.apana.org.au>  Mon May 24 09:12:34 2010
  Committer:	David S. Miller <davem@davemloft.net>  Mon May 24 09:12:34 2010
  cls_cgroup: Store classid in struct sock
  Tis code was added to init_cgroup_cls()
	  /* We can't use rcu_assign_pointer because this is an int. */
	  smp_wmb();
	  net_cls_subsys_id = net_cls_subsys.subsys_id;
  respectively to exit_cgroup_cls()
	  net_cls_subsys_id = -1;
	  synchronize_rcu();
  and in module version of task_cls_classid()
	  rcu_read_lock();
	  id = rcu_dereference(net_cls_subsys_id);
	  if (id >= 0)
		  classid = container_of(task_subsys_state(p, id),
					 struct cgroup_cls_state, css)->classid;
	  rcu_read_unlock();
Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)
So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.
The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).
So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.
Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.
The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).
No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.
When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
											 
										 
										
											2012-09-12 16:12:07 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  PTR_ERR ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & ss - > sibling ,  & rootnode . subsys_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ss - > root  =  & rootnode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* our new subsystem will be attached to the dummy hierarchy. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_cgroup_css ( css ,  ss ,  dummytop ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* init_idr must be after init_cgroup_css because it sets css->id. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > use_id )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  cgroup_init_idr ( ss ,  css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  err_unload ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Now  we  need  to  entangle  the  css  into  the  existing  css_sets .  unlike 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  in  cgroup_init_subsys ,  there  are  now  multiple  css_sets ,  so  each  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  will  need  a  new  pointer  to  it ;  done  by  iterating  the  css_set_table . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  furthermore ,  modifying  the  existing  css_sets  will  corrupt  the  hash 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  table  state ,  so  each  changed  css_set  will  need  its  hash  recomputed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  is  all  done  under  the  css_set_lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_for_each_safe ( css_set_table ,  i ,  tmp ,  cg ,  hlist )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* skip entries that we already rehashed */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cg - > subsys [ ss - > subsys_id ] ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* remove existing entry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										hash_del ( & cg - > hlist ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* set new value */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cg - > subsys [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* recompute hash and restore entry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										key  =  css_set_hash ( cg - > subsys ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_add ( css_set_table ,  & cg - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ss - > active  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  online_css ( ss ,  dummytop ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_unload ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* success! */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err_unload :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* @ss can't be mounted here as try_module_get() would fail */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_unload_subsys ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_load_subsys ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_unload_subsys :  unload  a  modular  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  subsystem  to  unload 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  should  be  called  in  a  modular  subsystem ' s  exitcall .  When  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  is  invoked ,  the  refcount  on  the  subsystem ' s  module  will  be  0 ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  subsystem  will  not  be  attached  to  any  hierarchy . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_unload_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ss - > module  = =  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  shouldn ' t  be  called  if  the  subsystem  is  in  use ,  and  the  use  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  try_module_get  in  parse_cgroupfs_options  should  ensure  that  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  doesn ' t  start  being  used  while  we ' re  killing  it  off . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ss - > root  ! =  & rootnode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									offline_css ( ss ,  dummytop ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ss - > active  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:03:49 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss - > use_id ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										idr_destroy ( & ss - > idr ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* deassign the subsys_id */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* remove subsystem from rootnode's list of subsystems */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-22 16:30:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_del_init ( & ss - > sibling ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  disentangle  the  css  from  all  css_sets  attached  to  the  dummytop .  as 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  in  loading ,  we  need  to  pay  our  respects  to  the  hashtable  gods . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & dummytop - > css_sets ,  cgrp_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  css_set  * cg  =  link - > cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_del ( & cg - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cg - > subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										key  =  css_set_hash ( cg - > subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										hash_add ( css_set_table ,  & cg - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  remove  subsystem ' s  css  from  the  dummytop  and  free  it  -  need  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  free  before  marking  as  null  because  ss - > css_free  needs  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgrp - > subsys  pointer  to  find  their  state .  note  that  this  also 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  takes  care  of  freeing  the  css_id . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ss - > css_free ( dummytop ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dummytop - > subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_unload_subsys ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_init_early  -  cgroup  initialization  at  system  boot 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Initialize  cgroups  at  system  boot ,  and  initialize  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subsystems  that  request  early  init . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init  cgroup_init_early ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_set ( & init_css_set . refcount ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & init_css_set . cg_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & init_css_set . tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_HLIST_NODE ( & init_css_set . hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_set_count  =  1 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									init_cgroup_root ( & rootnode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root_count  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_task . cgroups  =  & init_css_set ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_css_set_link . cg  =  & init_css_set ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_css_set_link . cgrp  =  dummytop ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add ( & init_css_set_link . cgrp_link_list , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 & rootnode . top_cgroup . css_sets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & init_css_set_link . cg_link_list , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 & init_css_set . cg_links ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* at bootup time, we don't worry about modular subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ss  | |  ss - > module ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( strlen ( ss - > name )  >  MAX_CGROUP_TYPE_NAMELEN ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > css_alloc ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > css_free ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ss - > subsys_id  ! =  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-11-14 16:58:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											printk ( KERN_ERR  " cgroup: Subsys %s id == %d \n " , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											       ss - > name ,  ss - > subsys_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > early_init ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_init_subsys ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_init  -  cgroup  initialization 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Register  cgroup  filesystem  and  / proc  file ,  and  initialize 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  any  subsystems  that  didn ' t  request  early  init . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init  cgroup_init ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									err  =  bdi_init ( & cgroup_backing_dev_info ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* at bootup time, we don't worry about modular subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ss  | |  ss - > module ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ! ss - > early_init ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_init_subsys ( ss ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > use_id ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_init_idr ( ss ,  init_css_set . subsys [ ss - > subsys_id ] ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Add init_css_set to the hash table */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									key  =  css_set_hash ( init_css_set . subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									hash_add ( css_set_table ,  & init_css_set . hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! init_root_id ( & rootnode ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_kobj  =  kobject_create_and_add ( " cgroup " ,  fs_kobj ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgroup_kobj )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									err  =  register_filesystem ( & cgroup_fs_type ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err  <  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kobject_put ( cgroup_kobj ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									proc_create ( " cgroups " ,  0 ,  NULL ,  & proc_cgroupstats_operations ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										bdi_destroy ( & cgroup_backing_dev_info ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  proc_cgroup_show ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  Print  task ' s  cgroup  paths  into  seq_file ,  one  line  for  each  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  Used  for  / proc / < pid > / cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  No  need  to  task_lock ( tsk )  on  this  tsk - > cgroup  reference ,  as  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     doesn ' t  really  matter  if  tsk - > cgroup  changes  after  we  read  it , 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     and  we  take  cgroup_mutex ,  keeping  cgroup_attach_task ( )  from  changing  it 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     anyway .   No  need  to  check  that  tsk - > cgroup  ! =  NULL ,  thanks  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the_top_cgroup_hack  in  cgroup_exit ( ) ,  which  sets  an  exiting  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     cgroup  to  top_cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* TODO: Use a proper seq_file iterator */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  proc_cgroup_show ( struct  seq_file  * m ,  void  * v )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  pid  * pid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * buf ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buf  =  kmalloc ( PAGE_SIZE ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! buf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  - ESRCH ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid  =  m - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tsk  =  get_pid_task ( pid ,  PIDTYPE_PID ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! tsk ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_active_root ( root )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( m ,  " %d: " ,  root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_subsys ( root ,  ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											seq_printf ( m ,  " %s%s " ,  count + +  ?  " , "  :  " " ,  ss - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											seq_printf ( m ,  " %sname=%s " ,  count  ?  " , "  :  " " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   root - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_putc ( m ,  ' : ' ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgrp  =  task_cgroup_from_root ( tsk ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  cgroup_path ( cgrp ,  buf ,  PAGE_SIZE ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( retval  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_puts ( m ,  buf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_putc ( m ,  ' \n ' ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_unlock :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_free :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( buf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_open ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  pid  * pid  =  PROC_I ( inode ) - > pid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  single_open ( file ,  proc_cgroup_show ,  pid ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								const  struct  file_operations  proc_cgroup_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. open 		=  cgroup_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. read 		=  seq_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek 		=  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release 	=  single_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* Display information about each subsystem and each hierarchy */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  proc_cgroupstats_show ( struct  seq_file  * m ,  void  * v )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									seq_puts ( m ,  " #subsys_name \t hierarchy \t num_cgroups \t enabled \n " ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  ideally  we  don ' t  want  subsystems  moving  around  while  we  do  this . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_mutex  is  also  necessary  to  guarantee  an  atomic  snapshot  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  subsys / hierarchy  state . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss  = =  NULL ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( m ,  " %s \t %d \t %d \t %d \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   ss - > name ,  ss - > root - > hierarchy_id , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											   ss - > root - > number_of_cgroups ,  ! ss - > disabled ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroupstats_open ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-03-29 03:07:28 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  single_open ( file ,  proc_cgroupstats_show ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  proc_cgroupstats_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. open  =  cgroupstats_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release  =  single_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_fork  -  attach  newly  forked  task  to  its  parents  cgroup . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ child :  pointer  to  task_struct  of  forking  parent  process . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Description :  A  task  inherits  its  parent ' s  cgroup  at  fork ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  pointer  to  the  shared  css_set  was  automatically  copied  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  fork . c  by  dup_task_struct ( ) .   However ,  we  ignore  that  copy ,  since 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  it  was  not  made  under  the  protection  of  RCU  or  cgroup_mutex ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  might  no  longer  be  a  valid  cgroup  pointer .   cgroup_attach_task ( )  might 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  have  already  changed  current - > cgroups ,  allowing  the  previously 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  referenced  cgroup  group  to  be  removed  and  freed . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  At  the  point  that  cgroup_fork ( )  is  called ,  ' current '  is  the  parent 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task ,  and  the  passed  argument  ' child '  points  to  the  child  task . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_fork ( struct  task_struct  * child )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_lock ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									child - > cgroups  =  current - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									get_css_set ( child - > cgroups ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_unlock ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & child - > cg_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_post_fork  -  called  on  a  new  task  after  adding  it  to  the  task  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ child :  the  task  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Adds  the  task  to  the  list  running  through  its  css_set  if  necessary  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  call  the  subsystem  fork ( )  callbacks .   Has  to  be  after  the  task  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  visible  on  the  task  list  in  case  we  race  with  the  first  call  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_iter_start ( )  -  to  guarantee  that  the  new  task  ends  up  on  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  list . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  cgroup_post_fork ( struct  task_struct  * child )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  use_task_css_set_links  is  set  to  1  before  we  walk  the  tasklist 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  under  the  tasklist_lock  and  we  read  it  here  after  we  added  the  child 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  to  the  tasklist  under  the  tasklist_lock  as  well .  If  the  child  wasn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  yet  in  the  tasklist  when  we  walked  through  it  from 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_enable_task_cg_lists ( ) ,  then  use_task_css_set_links  value 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  should  be  visible  now  due  to  the  paired  locking  and  barriers  implied 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  by  LOCK / UNLOCK :  it  is  written  before  the  tasklist_lock  unlock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  in  cgroup_enable_task_cg_lists ( )  and  read  here  after  the  tasklist_lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  lock  on  fork . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( use_task_css_set_links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:40:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										task_lock ( child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( list_empty ( & child - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_add ( & child - > cg_list ,  & child - > cgroups - > tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:40:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										task_unlock ( child ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Call  ss - > fork ( ) .   This  must  happen  after  @ child  is  linked  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set ;  otherwise ,  @ child  might  change  state  between  - > fork ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  and  addition  to  css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( need_forkexit_callback )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  fork / exit  callbacks  are  supported  only  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  builtin  subsystems  and  we  don ' t  need  further 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  synchronization  as  they  never  go  away . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! ss  | |  ss - > module ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > fork ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												ss - > fork ( child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_exit  -  detach  cgroup  from  exiting  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tsk :  pointer  to  task_struct  of  exiting  process 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ run_callback :  run  exit  callbacks ? 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Description :  Detach  cgroup  from  @ tsk  and  release  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Note  that  cgroups  marked  notify_on_release  force  every  task  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  them  to  take  the  global  cgroup_mutex  mutex  when  exiting . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  could  impact  scaling  on  very  large  systems .   Be  reluctant  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  use  notify_on_release  cgroups  where  very  high  task  exit  scaling 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  required  on  large  systems . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the_top_cgroup_hack : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Set  the  exiting  tasks  cgroup  to  the  root  cgroup  ( top_cgroup ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     We  call  cgroup_exit ( )  while  the  task  is  still  competent  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     handle  notify_on_release ( ) ,  then  leave  the  task  attached  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     root  cgroup  in  each  hierarchy  for  the  remainder  of  its  exit . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     To  do  this  properly ,  we  would  increment  the  reference  count  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     top_cgroup ,  and  near  the  very  end  of  the  kernel / exit . c  do_exit ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     code  we  would  add  a  second  cgroup  function  call ,  to  drop  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     reference .   This  would  just  create  an  unnecessary  hot  spot  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the  top_cgroup  reference  count ,  to  no  avail . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Normally ,  holding  a  reference  to  a  cgroup  without  bumping  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     count  is  unsafe .    The  cgroup  could  go  away ,  or  someone  could 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     attach  us  to  a  different  cgroup ,  decrementing  the  count  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the  first  cgroup  that  we  never  incremented .   But  in  this  case , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     top_cgroup  isn ' t  going  away ,  and  either  task  has  PF_EXITING  set , 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     which  wards  off  any  cgroup_attach_task ( )  attempts ,  or  task  is  a  failed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     fork ,  never  visible  to  cgroup_attach_task . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_exit ( struct  task_struct  * tsk ,  int  run_callbacks )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Unlink  from  the  css_set  task  list  if  necessary . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Optimistically  check  cg_list  before  taking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set_lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! list_empty ( & tsk - > cg_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! list_empty ( & tsk - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-22 16:30:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_del_init ( & tsk - > cg_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Reassign the task to the init_css_set. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									task_lock ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cg  =  tsk - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tsk - > cgroups  =  & init_css_set ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( run_callbacks  & &  need_forkexit_callback )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* modular subsystems can't use callbacks */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! ss  | |  ss - > module ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ss - > exit )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cgroup  * old_cgrp  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													rcu_dereference_raw ( cg - > subsys [ i ] ) - > cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cgroup  * cgrp  =  task_cgroup ( tsk ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-31 13:47:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > exit ( cgrp ,  old_cgrp ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_unlock ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:43:51 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									put_css_set_taskexit ( cg ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_is_descendant  -  see  if  @ cgrp  is  a  descendant  of  @ task ' s  cgrp 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  in  question 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ task :  the  task  in  question 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  See  if  @ cgrp  is  a  descendant  of  @ task ' s  cgroup  in  the  appropriate 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchy . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  If  we  are  sending  in  dummytop ,  then  presumably  we  are  creating 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  top  cgroup  in  the  subsystem . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Called  only  by  the  ns  ( nsproxy )  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_is_descendant ( const  struct  cgroup  * cgrp ,  struct  task_struct  * task )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * target ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgrp  = =  dummytop ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									target  =  task_cgroup_from_root ( task ,  cgrp - > root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( cgrp  ! =  target  & &  cgrp ! =  cgrp - > top_cgroup ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp  =  cgrp - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  ( cgrp  = =  target ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  check_for_release ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* All of these checks rely on RCU to keep the cgroup
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  structure  alive  */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgroup_is_releasable ( cgrp )  & &  ! atomic_read ( & cgrp - > count ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    & &  list_empty ( & cgrp - > children )  & &  ! cgroup_has_css_refs ( cgrp ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Control Group is currently removeable. If it's not
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  already  queued  for  a  userspace  notification ,  queue 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  it  now  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  need_schedule_work  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! cgroup_is_removed ( cgrp )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    list_empty ( & cgrp - > release_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_add ( & cgrp - > release_list ,  & release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											need_schedule_work  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( need_schedule_work ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											schedule_work ( & release_agent_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:05 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* Caller must verify that the css is not for root cgroup */  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  __css_tryget ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( true )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  t ,  v ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										v  =  css_refcnt ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										t  =  atomic_cmpxchg ( & css - > refcnt ,  v ,  v  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( likely ( t  = =  v ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										else  if  ( t  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  false ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cpu_relax ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( __css_tryget ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* Caller must verify that the css is not for root cgroup */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  __css_put ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  css - > cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-06-14 14:55:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  v ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-06-14 14:55:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									v  =  css_unbias_refcnt ( atomic_dec_return ( & css - > refcnt ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									switch  ( v )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									case  1 : 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( notify_on_release ( cgrp ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											set_bit ( CGRP_RELEASABLE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											check_for_release ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									case  0 : 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										schedule_work ( & css - > dput_work ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										break ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( __css_put ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Notify  userspace  when  a  cgroup  is  released ,  by  running  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  configured  release  agent  with  the  name  of  the  cgroup  ( path 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  relative  to  the  root  of  cgroup  file  system )  as  the  argument . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Most  likely ,  this  user  command  will  try  to  rmdir  this  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  races  with  the  possibility  that  some  other  task  will  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  attached  to  this  cgroup  before  it  is  removed ,  or  that  some  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  user  task  will  ' mkdir '  a  child  cgroup  of  this  cgroup .   That ' s  ok . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  presumed  ' rmdir '  will  fail  quietly  if  this  cgroup  is  no  longer 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  unused ,  and  this  cgroup  will  be  reprieved  from  its  death  sentence , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  continue  to  serve  a  useful  existence .   Next  time  it ' s  released , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  we  will  get  notified  again ,  if  it  still  has  ' notify_on_release '  set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  final  arg  to  call_usermodehelper ( )  is  UMH_WAIT_EXEC ,  which 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  means  only  wait  until  the  task  is  successfully  execve ( ) ' d .   The 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  separate  release  agent  task  is  forked  by  call_usermodehelper ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  then  control  in  this  thread  returns  here ,  without  waiting  for  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release  agent  task .   We  don ' t  bother  to  wait  because  the  caller  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  this  routine  has  no  use  for  the  exit  status  of  the  release  agent 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task ,  so  no  sense  holding  our  caller  up  for  that . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_release_agent ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( work  ! =  & release_agent_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( ! list_empty ( & release_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										char  * argv [ 3 ] ,  * envp [ 3 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										char  * pathbuf  =  NULL ,  * agentbuf  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  list_entry ( release_list . next , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
														    struct  cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														    release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pathbuf  =  kmalloc ( PAGE_SIZE ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! pathbuf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cgroup_path ( cgrp ,  pathbuf ,  PAGE_SIZE )  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										agentbuf  =  kstrdup ( cgrp - > root - > release_agent_path ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! agentbuf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										i  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										argv [ i + + ]  =  agentbuf ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										argv [ i + + ]  =  pathbuf ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										argv [ i ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										i  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* minimal command environment */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i + + ]  =  " HOME=/ " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i + + ]  =  " PATH=/sbin:/bin:/usr/sbin:/usr/bin " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Drop the lock while we invoke the usermode helper,
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  since  the  exec  could  involve  hitting  disk  and  hence 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  be  a  slow  process  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										call_usermodehelper ( argv [ 0 ] ,  argv ,  envp ,  UMH_WAIT_EXEC ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 continue_free : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( pathbuf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( agentbuf ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  __init  cgroup_disable ( char  * str )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( token  =  strsep ( & str ,  " , " ) )  ! =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! * token ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  CGROUP_SUBSYS_COUNT ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  cgroup_subsys  * ss  =  subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  cgroup_disable ,  being  at  boot  time ,  can ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  know  about  module  subsystems ,  so  we  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  worry  about  them . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! ss  | |  ss - > module ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! strcmp ( token ,  ss - > name ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												ss - > disabled  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												printk ( KERN_INFO  " Disabling %s control group " 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													"  subsystem \n " ,  ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								__setup ( " cgroup_disable= " ,  cgroup_disable ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Functons  for  CSS  ID . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * To  get  ID  other  than  0 ,  this  should  be  called  when  ! cgroup_is_removed ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								unsigned  short  css_id ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_id  * cssid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  This  css_id ( )  can  return  correct  value  when  somone  has  refcnt 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  on  this  or  this  is  under  rcu_read_lock ( ) .  Once  css - > id  is  allocated , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  it ' s  unchanged  until  freed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cssid  =  rcu_dereference_check ( css - > id ,  css_refcnt ( css ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cssid ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cssid - > id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_id ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								unsigned  short  css_depth ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_id  * cssid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cssid  =  rcu_dereference_check ( css - > id ,  css_refcnt ( css ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cssid ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cssid - > depth ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_depth ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   css_is_ancestor  -  test  " root "  css  is  an  ancestor  of  " child " 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ child :  the  css  to  be  tested . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ root :  the  css  supporsed  to  be  an  ancestor  of  the  child . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  true  if  " root "  is  an  ancestor  of  " child "  in  its  hierarchy .  Because 
							 
						 
					
						
							
								
									
										
										
										
											2012-05-29 15:06:24 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  this  function  reads  css - > id ,  the  caller  must  hold  rcu_read_lock ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  But ,  considering  usual  usage ,  the  csses  should  be  valid  objects  after  test . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Assuming  that  the  caller  will  do  some  action  to  the  child  if  this  returns 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  true ,  the  caller  must  take  " child " ; s  reference  count . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  If  " child "  is  valid  object  and  this  returns  true ,  " root "  is  valid ,  too . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  css_is_ancestor ( struct  cgroup_subsys_state  * child ,  
						 
					
						
							
								
									
										
											 
										
											
												memcg: fix OOM killer under memcg
This patch tries to fix OOM Killer problems caused by hierarchy.
Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
kill a task in memcg.
But, when hierarchy is used, it's broken and correct task cannot
be killed. For example, in following cgroup
	/groupA/	hierarchy=1, limit=1G,
		01	nolimit
		02	nolimit
All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
groupA's 1Gbytes but OOM Killer just kills tasks in groupA.
This patch provides makes the bad process be selected from all tasks
under hierarchy. BTW, currently, oom_jiffies is updated against groupA
in above case. oom_jiffies of tree should be updated.
To see how oom_jiffies is used, please check mem_cgroup_oom_called()
callers.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: const fix]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										    const  struct  cgroup_subsys_state  * root ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_id  * child_id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * root_id ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									child_id   =  rcu_dereference ( child - > id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-05-29 15:06:24 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! child_id ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-05-11 14:06:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root_id  =  rcu_dereference ( root - > id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-05-29 15:06:24 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! root_id ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( child_id - > depth  <  root_id - > depth ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( child_id - > stack [ root_id - > depth ]  ! =  root_id - > id ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  free_css_id ( struct  cgroup_subsys  * ss ,  struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * id  =  css - > id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* When this is called before css_id initialization, id can be NULL */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! id ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! ss - > use_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_assign_pointer ( id - > css ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_assign_pointer ( css - > id ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock ( & ss - > id_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_remove ( & ss - > idr ,  id - > id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_unlock ( & ss - > id_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-15 17:56:10 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree_rcu ( id ,  rcu_head ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( free_css_id ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  is  called  by  init  or  create ( ) .  Then ,  calls  to  this  function  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  always  serialized  ( By  cgroup_mutex ( )  at  create ( ) ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  css_id  * get_new_cssid ( struct  cgroup_subsys  * ss ,  int  depth )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * newid ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ,  size ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! ss - > use_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									size  =  sizeof ( * newid )  +  sizeof ( unsigned  short )  *  ( depth  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									newid  =  kzalloc ( size ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! newid ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									idr_preload ( GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock ( & ss - > id_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Don't use 0. allocates an ID of 1-65535 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  idr_alloc ( & ss - > idr ,  newid ,  1 ,  CSS_ID_MAX  +  1 ,  GFP_NOWAIT ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_unlock ( & ss - > id_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_preload_end ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Returns error when there are no free spaces for new ID.*/ 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret  <  0 ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									newid - > id  =  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									newid - > depth  =  depth ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  newid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err_out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( newid ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-27 17:04:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ERR_PTR ( ret ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  __init_or_module  cgroup_init_idr ( struct  cgroup_subsys  * ss ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													    struct  cgroup_subsys_state  * rootcss ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * newid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock_init ( & ss - > id_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_init ( & ss - > idr ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									newid  =  get_new_cssid ( ss ,  0 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( newid ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  PTR_ERR ( newid ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									newid - > stack [ 0 ]  =  newid - > id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									newid - > css  =  rootcss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rootcss - > id  =  newid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  alloc_css_id ( struct  cgroup_subsys  * ss ,  struct  cgroup  * parent ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup  * child ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  subsys_id ,  i ,  depth  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * parent_css ,  * child_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:30:00 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_id  * child_id ,  * parent_id ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									subsys_id  =  ss - > subsys_id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									parent_css  =  parent - > subsys [ subsys_id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									child_css  =  child - > subsys [ subsys_id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									parent_id  =  parent_css - > id ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-06-04 14:15:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									depth  =  parent_id - > depth  +  1 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									child_id  =  get_new_cssid ( ss ,  depth ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( child_id ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  PTR_ERR ( child_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  depth ;  i + + ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										child_id - > stack [ i ]  =  parent_id - > stack [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									child_id - > stack [ depth ]  =  child_id - > id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  child_id - > css  pointer  will  be  set  after  this  cgroup  is  available 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  see  cgroup_populate_dir ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_assign_pointer ( child_css - > id ,  child_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_lookup  -  lookup  css  by  id 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  cgroup  subsys  to  be  looked  into . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ id :  the  id 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  pointer  to  cgroup_subsys_state  if  there  is  valid  one  with  id . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  NULL  if  not .  Should  be  called  under  rcu_read_lock ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  * css_lookup ( struct  cgroup_subsys  * ss ,  int  id )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * cssid  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! ss - > use_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cssid  =  idr_find ( & ss - > idr ,  id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( unlikely ( ! cssid ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  rcu_dereference ( cssid - > css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_lookup ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_get_next  -  lookup  next  cgroup  under  specified  hierarchy . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  pointer  to  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ id :  current  position  of  iteration . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ root :  pointer  to  css .  search  tree  under  this . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ foundid :  position  of  found  object . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Search  next  css  under  the  specified  hierarchy  of  rootid .  Calling  under 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  rcu_read_lock ( )  is  necessary .  Returns  NULL  if  it  reaches  the  end . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_get_next ( struct  cgroup_subsys  * ss ,  int  id ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									     struct  cgroup_subsys_state  * root ,  int  * foundid ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * ret  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_id  * tmp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  tmpid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  rootid  =  css_id ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  depth  =  css_depth ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! rootid ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! ss - > use_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-21 16:34:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* fill start point for scan */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tmpid  =  id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( 1 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  scan  next  entry  from  bitmap ( tree ) ,  tmpid  is  updated  after 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  idr_get_next ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tmp  =  idr_get_next ( & ss - > idr ,  & tmpid ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! tmp ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( tmp - > depth  > =  depth  & &  tmp - > stack [ depth ]  = =  rootid )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ret  =  rcu_dereference ( tmp - > css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ret )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												* foundid  =  tmpid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* continue to scan from next id */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tmpid  =  tmpid  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  get  corresponding  css  from  file  open  on  cgroupfs  directory 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  * cgroup_css_from_dir ( struct  file  * f ,  int  id )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-23 17:07:38 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									inode  =  file_inode ( f ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* check in cgroup filesystem dir */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( inode - > i_op  ! =  & cgroup_dir_inode_operations ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - EBADF ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( id  <  0  | |  id  > =  CGROUP_SUBSYS_COUNT ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - EINVAL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* get cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp  =  __d_cgrp ( f - > f_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css  =  cgrp - > subsys [ id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  css  ?  css  :  ERR_PTR ( - ENOENT ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_CGROUP_DEBUG 
  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_subsys_state  * debug_css_alloc ( struct  cgroup  * cont )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  kzalloc ( sizeof ( * css ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! css ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  debug_css_free ( struct  cgroup  * cont )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( cont - > subsys [ debug_subsys_id ] ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  u64  cgroup_refcount_read ( struct  cgroup  * cont ,  struct  cftype  * cft )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  atomic_read ( & cont - > count ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  u64  debug_taskcount_read ( struct  cgroup  * cont ,  struct  cftype  * cft )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  cgroup_task_count ( cont ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  u64  current_css_set_read ( struct  cgroup  * cont ,  struct  cftype  * cft )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ( u64 ) ( unsigned  long ) current - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  u64  current_css_set_refcount_read ( struct  cgroup  * cont ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													   struct  cftype  * cft ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									u64  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									count  =  atomic_read ( & current - > cgroups - > refcount ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  current_css_set_cg_links_read ( struct  cgroup  * cont ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													 struct  cftype  * cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													 struct  seq_file  * seq ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cg  =  rcu_dereference ( current - > cgroups ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cg - > cg_links ,  cg_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										const  char  * name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( c - > dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											name  =  c - > dentry - > d_name . name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											name  =  " ? " ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( seq ,  " Root %d group %s \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   c - > root - > hierarchy_id ,  name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define MAX_TASKS_SHOWN_PER_CSS 25 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_css_links_read ( struct  cgroup  * cont ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 struct  cftype  * cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 struct  seq_file  * seq ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cg_cgroup_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cont - > css_sets ,  cgrp_link_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  css_set  * cg  =  link - > cg ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  task_struct  * task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " css_set %p \n " ,  cg ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( task ,  & cg - > tasks ,  cg_list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( count + +  >  MAX_TASKS_SHOWN_PER_CSS )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												seq_puts ( seq ,  "   ... \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												seq_printf ( seq ,  "   task %d \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													   task_pid_vnr ( task ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  releasable_read ( struct  cgroup  * cgrp ,  struct  cftype  * cft )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  test_bit ( CGRP_RELEASABLE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cftype  debug_files [ ]  =   {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup_refcount " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  cgroup_refcount_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " taskcount " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  debug_taskcount_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  current_css_set_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set_refcount " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  current_css_set_refcount_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set_cg_links " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  current_css_set_cg_links_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup_css_links " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  cgroup_css_links_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " releasable " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  releasable_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{  } 	/* terminate */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys  debug_subsys  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. name  =  " debug " , 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. css_alloc  =  debug_css_alloc , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. css_free  =  debug_css_free , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. subsys_id  =  debug_subsys_id , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. base_cftypes  =  debug_files , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif  /* CONFIG_CGROUP_DEBUG */