| 
									
										
										
										
											2006-09-30 20:52:18 +02:00
										 |  |  | /* fs/ internal definitions
 | 
					
						
							|  |  |  |  * | 
					
						
							|  |  |  |  * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved. | 
					
						
							|  |  |  |  * Written by David Howells (dhowells@redhat.com) | 
					
						
							|  |  |  |  * | 
					
						
							|  |  |  |  * This program is free software; you can redistribute it and/or | 
					
						
							|  |  |  |  * modify it under the terms of the GNU General Public License | 
					
						
							|  |  |  |  * as published by the Free Software Foundation; either version | 
					
						
							|  |  |  |  * 2 of the License, or (at your option) any later version. | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-08-31 12:55:23 +02:00
										 |  |  | struct super_block; | 
					
						
							| 
									
										
										
										
											2011-03-17 22:08:28 -04:00
										 |  |  | struct file_system_type; | 
					
						
							| 
									
										
											  
											
												CRED: Make execve() take advantage of copy-on-write credentials
Make execve() take advantage of copy-on-write credentials, allowing it to set
up the credentials in advance, and then commit the whole lot after the point
of no return.
This patch and the preceding patches have been tested with the LTP SELinux
testsuite.
This patch makes several logical sets of alteration:
 (1) execve().
     The credential bits from struct linux_binprm are, for the most part,
     replaced with a single credentials pointer (bprm->cred).  This means that
     all the creds can be calculated in advance and then applied at the point
     of no return with no possibility of failure.
     I would like to replace bprm->cap_effective with:
	cap_isclear(bprm->cap_effective)
     but this seems impossible due to special behaviour for processes of pid 1
     (they always retain their parent's capability masks where normally they'd
     be changed - see cap_bprm_set_creds()).
     The following sequence of events now happens:
     (a) At the start of do_execve, the current task's cred_exec_mutex is
     	 locked to prevent PTRACE_ATTACH from obsoleting the calculation of
     	 creds that we make.
     (a) prepare_exec_creds() is then called to make a copy of the current
     	 task's credentials and prepare it.  This copy is then assigned to
     	 bprm->cred.
  	 This renders security_bprm_alloc() and security_bprm_free()
     	 unnecessary, and so they've been removed.
     (b) The determination of unsafe execution is now performed immediately
     	 after (a) rather than later on in the code.  The result is stored in
     	 bprm->unsafe for future reference.
     (c) prepare_binprm() is called, possibly multiple times.
     	 (i) This applies the result of set[ug]id binaries to the new creds
     	     attached to bprm->cred.  Personality bit clearance is recorded,
     	     but now deferred on the basis that the exec procedure may yet
     	     fail.
         (ii) This then calls the new security_bprm_set_creds().  This should
	     calculate the new LSM and capability credentials into *bprm->cred.
	     This folds together security_bprm_set() and parts of
	     security_bprm_apply_creds() (these two have been removed).
	     Anything that might fail must be done at this point.
         (iii) bprm->cred_prepared is set to 1.
	     bprm->cred_prepared is 0 on the first pass of the security
	     calculations, and 1 on all subsequent passes.  This allows SELinux
	     in (ii) to base its calculations only on the initial script and
	     not on the interpreter.
     (d) flush_old_exec() is called to commit the task to execution.  This
     	 performs the following steps with regard to credentials:
	 (i) Clear pdeath_signal and set dumpable on certain circumstances that
	     may not be covered by commit_creds().
         (ii) Clear any bits in current->personality that were deferred from
             (c.i).
     (e) install_exec_creds() [compute_creds() as was] is called to install the
     	 new credentials.  This performs the following steps with regard to
     	 credentials:
         (i) Calls security_bprm_committing_creds() to apply any security
             requirements, such as flushing unauthorised files in SELinux, that
             must be done before the credentials are changed.
	     This is made up of bits of security_bprm_apply_creds() and
	     security_bprm_post_apply_creds(), both of which have been removed.
	     This function is not allowed to fail; anything that might fail
	     must have been done in (c.ii).
         (ii) Calls commit_creds() to apply the new credentials in a single
             assignment (more or less).  Possibly pdeath_signal and dumpable
             should be part of struct creds.
	 (iii) Unlocks the task's cred_replace_mutex, thus allowing
	     PTRACE_ATTACH to take place.
         (iv) Clears The bprm->cred pointer as the credentials it was holding
             are now immutable.
         (v) Calls security_bprm_committed_creds() to apply any security
             alterations that must be done after the creds have been changed.
             SELinux uses this to flush signals and signal handlers.
     (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
     	 to destroy the proposed new credentials and will then unlock
     	 cred_replace_mutex.  No changes to the credentials will have been
     	 made.
 (2) LSM interface.
     A number of functions have been changed, added or removed:
     (*) security_bprm_alloc(), ->bprm_alloc_security()
     (*) security_bprm_free(), ->bprm_free_security()
     	 Removed in favour of preparing new credentials and modifying those.
     (*) security_bprm_apply_creds(), ->bprm_apply_creds()
     (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()
     	 Removed; split between security_bprm_set_creds(),
     	 security_bprm_committing_creds() and security_bprm_committed_creds().
     (*) security_bprm_set(), ->bprm_set_security()
     	 Removed; folded into security_bprm_set_creds().
     (*) security_bprm_set_creds(), ->bprm_set_creds()
     	 New.  The new credentials in bprm->creds should be checked and set up
     	 as appropriate.  bprm->cred_prepared is 0 on the first call, 1 on the
     	 second and subsequent calls.
     (*) security_bprm_committing_creds(), ->bprm_committing_creds()
     (*) security_bprm_committed_creds(), ->bprm_committed_creds()
     	 New.  Apply the security effects of the new credentials.  This
     	 includes closing unauthorised files in SELinux.  This function may not
     	 fail.  When the former is called, the creds haven't yet been applied
     	 to the process; when the latter is called, they have.
 	 The former may access bprm->cred, the latter may not.
 (3) SELinux.
     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:
     (a) The bprm_security_struct struct has been removed in favour of using
     	 the credentials-under-construction approach.
     (c) flush_unauthorized_files() now takes a cred pointer and passes it on
     	 to inode_has_perm(), file_has_perm() and dentry_open().
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
											
										 
											2008-11-14 10:39:24 +11:00
										 |  |  | struct linux_binprm; | 
					
						
							| 
									
										
										
										
											2009-03-29 19:00:13 -04:00
										 |  |  | struct path; | 
					
						
							| 
									
										
										
										
											2011-11-24 18:22:03 -05:00
										 |  |  | struct mount; | 
					
						
							| 
									
										
										
										
											2006-08-31 12:55:23 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-09-30 20:52:18 +02:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * block_dev.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
											  
											
												[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.
 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:
     (*) Block I/O tracing.
     (*) Disk partition code.
     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.
     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.
     (*) MTD blockdev handling and FTL.
     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.
 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.
 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.
 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.
 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:
     (*) Default blockdev file operations (to give error ENODEV on opening).
 (*) Makes some /proc changes:
     (*) /proc/devices does not list any blockdevs.
     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.
 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).
 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
											
										 
											2006-09-30 20:45:40 +02:00
										 |  |  | #ifdef CONFIG_BLOCK
 | 
					
						
							| 
									
										
										
										
											2006-09-30 20:52:18 +02:00
										 |  |  | extern void __init bdev_cache_init(void); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-04-27 16:43:51 +02:00
										 |  |  | extern int __sync_blockdev(struct block_device *bdev, int wait); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
											  
											
												[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.
 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:
     (*) Block I/O tracing.
     (*) Disk partition code.
     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.
     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.
     (*) MTD blockdev handling and FTL.
     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.
 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.
 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.
 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.
 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:
     (*) Default blockdev file operations (to give error ENODEV on opening).
 (*) Makes some /proc changes:
     (*) /proc/devices does not list any blockdevs.
     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.
 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).
 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
											
										 
											2006-09-30 20:45:40 +02:00
										 |  |  | #else
 | 
					
						
							| 
									
										
										
										
											2006-08-31 12:55:23 +02:00
										 |  |  | static inline void bdev_cache_init(void) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
											  
											
												[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.
 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:
     (*) Block I/O tracing.
     (*) Disk partition code.
     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.
     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.
     (*) MTD blockdev handling and FTL.
     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.
 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.
 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.
 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.
 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:
     (*) Default blockdev file operations (to give error ENODEV on opening).
 (*) Makes some /proc changes:
     (*) /proc/devices does not list any blockdevs.
     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.
 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).
 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
											
										 
											2006-09-30 20:45:40 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-04-27 16:43:51 +02:00
										 |  |  | static inline int __sync_blockdev(struct block_device *bdev, int wait) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
											  
											
												[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.
 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:
     (*) Block I/O tracing.
     (*) Disk partition code.
     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.
     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.
     (*) MTD blockdev handling and FTL.
     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.
 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.
 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.
 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.
 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:
     (*) Default blockdev file operations (to give error ENODEV on opening).
 (*) Makes some /proc changes:
     (*) /proc/devices does not list any blockdevs.
     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.
 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).
 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
											
										 
											2006-09-30 20:45:40 +02:00
										 |  |  | #endif
 | 
					
						
							| 
									
										
										
										
											2006-08-29 19:06:07 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-10-09 15:26:55 -07:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * buffer.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern void guard_bio_eod(int rw, struct bio *bio); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-09-30 20:52:18 +02:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * char_dev.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern void __init chrdev_init(void); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-06-25 12:55:46 +01:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * namei.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2013-09-08 14:03:27 -04:00
										 |  |  | extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *); | 
					
						
							|  |  |  | extern int vfs_path_lookup(struct dentry *, struct vfsmount *, | 
					
						
							|  |  |  | 			   const char *, unsigned int, struct path *); | 
					
						
							| 
									
										
										
										
											2012-06-25 12:55:46 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-09-30 20:52:18 +02:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * namespace.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern int copy_mount_options(const void __user *, unsigned long *); | 
					
						
							| 
									
										
										
										
											2014-08-28 11:26:03 -06:00
										 |  |  | extern char *copy_mount_string(const void __user *); | 
					
						
							| 
									
										
										
										
											2008-03-22 15:48:17 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-03-21 14:28:58 +00:00
										 |  |  | extern struct vfsmount *lookup_mnt(struct path *); | 
					
						
							| 
									
										
										
										
											2011-01-17 01:35:23 -05:00
										 |  |  | extern int finish_automount(struct vfsmount *, struct path *); | 
					
						
							| 
									
										
										
										
											2008-03-22 15:48:17 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-11-21 12:11:31 +01:00
										 |  |  | extern int sb_prepare_remount_readonly(struct super_block *); | 
					
						
							| 
									
										
										
										
											2011-01-14 22:30:21 -05:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2008-03-22 15:48:17 -04:00
										 |  |  | extern void __init mnt_init(void); | 
					
						
							| 
									
										
										
										
											2009-03-29 19:00:13 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-06-12 16:20:35 +02:00
										 |  |  | extern int __mnt_want_write(struct vfsmount *); | 
					
						
							|  |  |  | extern int __mnt_want_write_file(struct file *); | 
					
						
							|  |  |  | extern void __mnt_drop_write(struct vfsmount *); | 
					
						
							|  |  |  | extern void __mnt_drop_write_file(struct file *); | 
					
						
							| 
									
										
										
										
											2010-02-05 02:01:14 -05:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-03-29 19:00:13 -04:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * fs_struct.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2013-03-01 23:51:07 -05:00
										 |  |  | extern void chroot_fs_refs(const struct path *, const struct path *); | 
					
						
							| 
									
										
										
										
											2009-04-26 20:25:56 +10:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * file_table.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2009-12-04 15:47:36 -05:00
										 |  |  | extern struct file *get_empty_filp(void); | 
					
						
							| 
									
										
										
										
											2009-05-07 03:12:29 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * super.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern int do_remount_sb(struct super_block *, int, void *, int); | 
					
						
							| 
									
										
										
										
											2011-07-08 14:14:41 +10:00
										 |  |  | extern bool grab_super_passive(struct super_block *sb); | 
					
						
							| 
									
										
										
										
											2011-03-17 22:08:28 -04:00
										 |  |  | extern struct dentry *mount_fs(struct file_system_type *, | 
					
						
							|  |  |  | 			       int, const char *, void *); | 
					
						
							| 
									
										
										
										
											2012-01-02 22:28:36 -05:00
										 |  |  | extern struct super_block *user_get_super(dev_t); | 
					
						
							| 
									
										
										
										
											2009-12-19 10:10:39 -05:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * open.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2011-02-23 17:44:09 -05:00
										 |  |  | struct open_flags { | 
					
						
							|  |  |  | 	int open_flag; | 
					
						
							| 
									
										
										
										
											2011-11-21 14:59:34 -05:00
										 |  |  | 	umode_t mode; | 
					
						
							| 
									
										
										
										
											2011-02-23 17:44:09 -05:00
										 |  |  | 	int acc_mode; | 
					
						
							|  |  |  | 	int intent; | 
					
						
							| 
									
										
										
										
											2013-06-11 08:23:01 +04:00
										 |  |  | 	int lookup_flags; | 
					
						
							| 
									
										
										
										
											2011-02-23 17:44:09 -05:00
										 |  |  | }; | 
					
						
							| 
									
										
										
										
											2012-10-10 16:43:10 -04:00
										 |  |  | extern struct file *do_filp_open(int dfd, struct filename *pathname, | 
					
						
							| 
									
										
										
										
											2013-06-11 08:23:01 +04:00
										 |  |  | 		const struct open_flags *op); | 
					
						
							| 
									
										
										
										
											2011-03-11 12:08:24 -05:00
										 |  |  | extern struct file *do_file_open_root(struct dentry *, struct vfsmount *, | 
					
						
							| 
									
										
										
										
											2013-06-11 08:23:01 +04:00
										 |  |  | 		const char *, const struct open_flags *); | 
					
						
							| 
									
										
										
										
											2010-10-24 11:13:10 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-01-29 18:43:26 +05:30
										 |  |  | extern long do_handle_open(int mountdirfd, | 
					
						
							|  |  |  | 			   struct file_handle __user *ufh, int open_flag); | 
					
						
							| 
									
										
										
										
											2012-05-21 17:30:15 +02:00
										 |  |  | extern int open_check_o_direct(struct file *f); | 
					
						
							| 
									
										
										
										
											2011-01-29 18:43:26 +05:30
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2010-10-24 11:13:10 -04:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * inode.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2011-03-22 22:23:40 +11:00
										 |  |  | extern spinlock_t inode_sb_list_lock; | 
					
						
							| 
									
										
										
										
											2013-08-28 10:18:05 +10:00
										 |  |  | extern long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan, | 
					
						
							|  |  |  | 			    int nid); | 
					
						
							| 
									
										
										
										
											2012-11-26 16:29:51 -08:00
										 |  |  | extern void inode_add_lru(struct inode *inode); | 
					
						
							| 
									
										
										
										
											2011-03-22 22:23:40 +11:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-03-22 22:23:41 +11:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * fs-writeback.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern void inode_wb_list_del(struct inode *inode); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
											  
											
												fs: bump inode and dentry counters to long
This series reworks our current object cache shrinking infrastructure in
two main ways:
 * Noticing that a lot of users copy and paste their own version of LRU
   lists for objects, we put some effort in providing a generic version.
   It is modeled after the filesystem users: dentries, inodes, and xfs
   (for various tasks), but we expect that other users could benefit in
   the near future with little or no modification.  Let us know if you
   have any issues.
 * The underlying list_lru being proposed automatically and
   transparently keeps the elements in per-node lists, and is able to
   manipulate the node lists individually.  Given this infrastructure, we
   are able to modify the up-to-now hammer called shrink_slab to proceed
   with node-reclaim instead of always searching memory from all over like
   it has been doing.
Per-node lru lists are also expected to lead to less contention in the lru
locks on multi-node scans, since we are now no longer fighting for a
global lock.  The locks usually disappear from the profilers with this
change.
Although we have no official benchmarks for this version - be our guest to
independently evaluate this - earlier versions of this series were
performance tested (details at
http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
visible performance regressions while yielding a better qualitative
behavior in NUMA machines.
With this infrastructure in place, we can use the list_lru entry point to
provide memcg isolation and per-memcg targeted reclaim.  Historically,
those two pieces of work have been posted together.  This version presents
only the infrastructure work, deferring the memcg work for a later time,
so we can focus on getting this part tested.  You can see more about the
history of such work at http://lwn.net/Articles/552769/
Dave Chinner (18):
  dcache: convert dentry_stat.nr_unused to per-cpu counters
  dentry: move to per-sb LRU locks
  dcache: remove dentries from LRU before putting on dispose list
  mm: new shrinker API
  shrinker: convert superblock shrinkers to new API
  list: add a new LRU list type
  inode: convert inode lru list to generic lru list code.
  dcache: convert to use new lru list infrastructure
  list_lru: per-node list infrastructure
  shrinker: add node awareness
  fs: convert inode and dentry shrinking to be node aware
  xfs: convert buftarg LRU to generic code
  xfs: rework buffer dispose list tracking
  xfs: convert dquot cache lru to list_lru
  fs: convert fs shrinkers to new scan/count API
  drivers: convert shrinkers to new count/scan API
  shrinker: convert remaining shrinkers to count/scan API
  shrinker: Kill old ->shrink API.
Glauber Costa (7):
  fs: bump inode and dentry counters to long
  super: fix calculation of shrinkable objects for small numbers
  list_lru: per-node API
  vmscan: per-node deferred work
  i915: bail out earlier when shrinker cannot acquire mutex
  hugepage: convert huge zero page shrinker to new shrinker API
  list_lru: dynamically adjust node arrays
This patch:
There are situations in very large machines in which we can have a large
quantity of dirty inodes, unused dentries, etc.  This is particularly true
when umounting a filesystem, where eventually since every live object will
eventually be discarded.
Dave Chinner reported a problem with this while experimenting with the
shrinker revamp patchset.  So we believe it is time for a change.  This
patch just moves int to longs.  Machines where it matters should have a
big long anyway.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
											
										 
											2013-08-28 10:17:53 +10:00
										 |  |  | extern long get_nr_dirty_inodes(void); | 
					
						
							| 
									
										
										
										
											2010-10-29 05:49:13 -04:00
										 |  |  | extern void evict_inodes(struct super_block *); | 
					
						
							| 
									
										
										
										
											2011-02-24 17:25:47 +11:00
										 |  |  | extern int invalidate_inodes(struct super_block *, bool); | 
					
						
							| 
									
										
										
										
											2011-07-07 15:03:58 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * dcache.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); | 
					
						
							| 
									
										
										
										
											2013-09-05 14:39:11 +02:00
										 |  |  | extern int d_set_mounted(struct dentry *dentry); | 
					
						
							| 
									
										
										
										
											2013-08-28 10:18:05 +10:00
										 |  |  | extern long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, | 
					
						
							|  |  |  | 			    int nid); | 
					
						
							| 
									
										
										
										
											2013-03-20 13:19:30 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * read_write.c | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2013-06-19 15:26:04 +04:00
										 |  |  | extern int rw_verify_area(int, struct file *, const loff_t *, size_t); | 
					
						
							| 
									
										
										
										
											2013-03-12 09:58:10 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * pipe.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern const struct file_operations pipefifo_fops; | 
					
						
							| 
									
										
										
										
											2014-05-21 18:22:52 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * fs_pin.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern void sb_pin_kill(struct super_block *sb); | 
					
						
							|  |  |  | extern void mnt_pin_kill(struct mount *m); | 
					
						
							| 
									
										
											  
											
												take the targets of /proc/*/ns/* symlinks to separate fs
New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot.  The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.
As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
											
										 
											2014-11-01 10:57:28 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * fs/nsfs.c | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | extern struct dentry_operations ns_dentry_operations; |