linux-pinenote

Author	SHA1	Message	Date
Yan Zheng	2456242530	Btrfs: hold trans_mutex when using btrfs_record_root_in_trans btrfs_record_root_in_trans needs the trans_mutex held to make sure two callers don't race to setup the root in a given transaction. This adds it to all the places that were missing it. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2009-02-12 14:14:53 -05:00
Chris Mason	4008c04a07	Btrfs: make a lockdep class for the extent buffer locks Btrfs is currently using spin_lock_nested with a nested value based on the tree depth of the block. But, this doesn't quite work because the max tree depth is bigger than what spin_lock_nested can deal with, and because locks are sometimes taken before the level field is filled in. The solution here is to use lockdep_set_class_and_name instead, and to set the class before unlocking the pages when the block is read from the disk and just after init of a freshly allocated tree block. btrfs_clear_path_blocking is also changed to take the locks in the proper order, and it also makes sure all the locks currently held are properly set to blocking before it tries to retake the spinlocks. Otherwise, lockdep gets upset about bad lock orderin. The lockdep magic cam from Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:09:45 -05:00
Linus Torvalds	071a0bc2ce	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: mm: Export symbol ksize()	2009-02-12 09:56:14 -08:00
Nick Piggin	3a4c6800f3	Fix page writeback thinko, causing Berkeley DB slowdown A bug was introduced into write_cache_pages cyclic writeout by commit `31a12666d8` ("mm: write_cache_pages cyclic fix"). The intention (and comments) is that we should cycle back and look for more dirty pages at the beginning of the file if there is no more work to be done. But the !done condition was dropped from the test. This means that any time the page writeout loop breaks (eg. due to nr_to_write == 0), we will set index to 0, then goto again. This will set done_index to index, then find done is set, so will proceed to the end of the function. When updating mapping->writeback_index for cyclic writeout, we now use done_index == 0, so we're always cycling back to 0. This seemed to be causing random mmap writes (slapadd and iozone) to start writing more pages from the LRU and writeout would slowdown, and caused bugzilla entry http://bugzilla.kernel.org/show_bug.cgi?id=12604 about Berkeley DB slowing down dramatically. With this patch, iozone random write performance is increased nearly 5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2). Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-and-tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-12 08:10:53 -08:00
Kirill A. Shutemov	b1aabecd55	mm: Export symbol ksize() Commit `7b2cd92adc` ("crypto: api - Fix zeroing on free") added modular user of ksize(). Export that to fix crypto.ko compilation. Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-02-12 17:50:46 +02:00
Julia Lawall	3f3420df50	Btrfs: fs/btrfs/volumes.c: remove useless kzalloc The call to kzalloc is followed by a kmalloc whose result is stored in the same variable. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,l; position p1,p2; expression ptr != NULL; @@ ( if ((x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...)) == NULL) S \| x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...); ... if (x == NULL) S ) <... when != x when != if (...) { <+...x...+> } x->f = E ...> ( return $0\\|<+...x...+>\\|ptr$; \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 10:16:03 -05:00
Qinghuang Feng	a48ddf08ba	Btrfs: remove unused code in split_state() These two lines are not used, remove them. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:25:23 -05:00
Jeff Mahoney	e00f730865	Btrfs: remove btrfs_init_path btrfs_init_path was initially used when the path objects were on the stack. Now all the work is done by btrfs_alloc_path and btrfs_init_path isn't required. This patch removes it, and just uses kmem_cache_zalloc to zero out the object. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:11:25 -05:00
Jeff Mahoney	7951f3cefb	Btrfs: balance_level checks !child after access The BUG_ON() is in the wrong spot. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 10:06:15 -05:00
Yan Zheng	b335b0034e	Btrfs: Avoid using __GFP_HIGHMEM with slab allocator btrfs_releasepage may call kmem_cache_alloc indirectly, and provide same GFP flags it gets to kmem_cache_alloc. So it's possible to use __GFP_HIGHMEM with the slab allocator. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2009-02-12 10:06:04 -05:00
Chris Mason	e1df36d2f1	Btrfs: don't clean old snapshots on sync(1) Cleaning old snapshots can make sync(1) somewhat slow, and some users and applications still use it in a global fsync kind of workload. This patch changes btrfs not to clean old snapshots during sync, which is safe from a FS consistency point of view. The major downside is that it makes it difficult to tell when old snapshots have been reaped and the space they were using has been reclaimed. A new ioctl will be added for this purpose instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:45:08 -05:00
Chris Mason	536ac8ae86	Btrfs: use larger metadata clusters in ssd mode Larger metadata clusters can significantly improve writeback performance on ssd drives with large erasure blocks. The larger clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle. On spinning media, lager metadata clusters end up spreading out the metadata more over time, which makes fsck slower, so we don't want this to be the default. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:41:38 -05:00
Chris Mason	b288052e17	Btrfs: process mount options on mount -o remount, Btrfs wasn't parsing any new mount options during remount, making it difficult to set mount options on a root drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:37:35 -05:00
Josef Bacik	eb09967089	Btrfs: make sure all pending extent operations are complete Theres a slight problem with finish_current_insert, if we set all to 1 and then go through and don't actually skip any of the extents on the pending list, we could exit right after we've added new extents. This is a problem because by inserting the new extents we could have gotten new COW's to happen and such, so we may have some pending updates to do or even more inserts to do after that. So this patch will only exit if we have never skipped any of the extents in the pending list, and we have no extents to insert, this will make sure that all of the pending work is truly done before we return. I've been running with this patch for a few days with all of my other testing and have not seen issues. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-12 09:27:38 -05:00
Ingo Molnar	a0490fa35d	sched: cpu hotplug fix rq_attach_root() does a kfree() with the runqueue lock held. That's not a very wise move, fix it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-12 11:57:36 +01:00
Tetsuo Handa	35d50e60e8	tomoyo: fix sparse warning Fix sparse warning. $ make C=2 SUBDIRS=security/tomoyo CF="-D__cold__=" CHECK security/tomoyo/common.c CHECK security/tomoyo/realpath.c CHECK security/tomoyo/tomoyo.c security/tomoyo/tomoyo.c:110:8: warning: symbol 'buf' shadows an earlier one security/tomoyo/tomoyo.c💯7: originally declared here Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 20:21:10 +11:00
Kuninori Morimoto	34aeb43e2d	serial: sh-sci: fix overrun error handling for SH7785 SCIF. There was a typo for the overrun bit definition, causing it not to be handled correctly on SH7785, fix it up. Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2009-02-12 17:27:12 +09:00
Tobias Klauser	270c5609e2	sh: Storage class should be before const qualifier The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2009-02-12 17:26:09 +09:00
Suresh Siddha	be03d9e802	x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem Jeff Mahoney reported: > With Suse's hwinfo tool, on -tip: > WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d() reserve_pfn_range() is not tracking the memory range below 1MB as non-RAM and as such is inconsistent with similar checks in reserve_memtype() and free_memtype() Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the "track legacy 1MB region as non RAM" condition. And also, fix reserve_pfn_range() to return -EINVAL, when the pfn range is RAM. This is to be consistent with this API design. Reported-and-tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-12 08:27:27 +01:00
Jeremy Fitzhardinge	4f06b0436b	x86/cpa: make sure cpa is safe to call in lazy mmu mode Impact: fix race leading to crash under KVM and Xen The CPA code may be called while we're in lazy mmu update mode - for example, when using DEBUG_PAGE_ALLOC and doing a slab allocation in an interrupt handler which interrupted a lazy mmu update. In this case, the in-memory pagetable state may be out of date due to pending queued updates. We need to flush any pending updates before inspecting the page table. Similarly, we must explicitly flush any modifications CPA may have made (which comes down to flushing queued operations when flushing the TLB). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-12 08:27:26 +01:00
James Morris	42d5aaa2d8	security: change link order of LSMs so security=tomoyo works LSMs need to be linked before root_plug to ensure the security= boot parameter works with them. Do this for Tomoyo. (root_plug probably needs to be taken out and shot at some point, too). Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 16:29:04 +11:00
Kentaro Takeda	d74db3b22a	MAINTAINERS info The archive of tomoyo-users-en mailing list is available at http://lists.sourceforge.jp/mailman/archives/tomoyo-users-en/ . Mailing lists for Japanese users are at http://lists.sourceforge.jp/mailman/archives/tomoyo-users/ and http://lists.sourceforge.jp/mailman/archives/tomoyo-dev/ . TOMOYO Linux English portal is at http://elinux.org/TomoyoLinux . Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:19:01 +11:00
Kentaro Takeda	00d7d6f840	Kconfig and Makefile TOMOYO uses LSM hooks for pathname based access control and securityfs support. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:19:00 +11:00
Kentaro Takeda	f743324377	LSM adapter functions. DAC's permissions and TOMOYO's permissions are not one-to-one mapping. Regarding DAC, there are "read", "write", "execute" permissions. Regarding TOMOYO, there are "allow_read", "allow_write", "allow_read/write", "allow_execute", "allow_create", "allow_unlink", "allow_mkdir", "allow_rmdir", "allow_mkfifo", "allow_mksock", "allow_mkblock", "allow_mkchar", "allow_truncate", "allow_symlink", "allow_rewrite", "allow_link", "allow_rename" permissions. +----------------------------------+----------------------------------+ \| requested operation \| required TOMOYO's permission \| +----------------------------------+----------------------------------+ \| sys_open(O_RDONLY) \| allow_read \| +----------------------------------+----------------------------------+ \| sys_open(O_WRONLY) \| allow_write \| +----------------------------------+----------------------------------+ \| sys_open(O_RDWR) \| allow_read/write \| +----------------------------------+----------------------------------+ \| open_exec() from do_execve() \| allow_execute \| +----------------------------------+----------------------------------+ \| open_exec() from !do_execve() \| allow_read \| +----------------------------------+----------------------------------+ \| sys_read() \| (none) \| +----------------------------------+----------------------------------+ \| sys_write() \| (none) \| +----------------------------------+----------------------------------+ \| sys_mmap() \| (none) \| +----------------------------------+----------------------------------+ \| sys_uselib() \| allow_read \| +----------------------------------+----------------------------------+ \| sys_open(O_CREAT) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_open(O_TRUNC) \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_truncate() \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_ftruncate() \| allow_truncate \| +----------------------------------+----------------------------------+ \| sys_open() without O_APPEND \| allow_rewrite \| +----------------------------------+----------------------------------+ \| setfl() without O_APPEND \| allow_rewrite \| +----------------------------------+----------------------------------+ \| sys_sysctl() for writing \| allow_write \| +----------------------------------+----------------------------------+ \| sys_sysctl() for reading \| allow_read \| +----------------------------------+----------------------------------+ \| sys_unlink() \| allow_unlink \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFREG) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_mknod(0) \| allow_create \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFIFO) \| allow_mkfifo \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFSOCK) \| allow_mksock \| +----------------------------------+----------------------------------+ \| sys_bind(AF_UNIX) \| allow_mksock \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFBLK) \| allow_mkblock \| +----------------------------------+----------------------------------+ \| sys_mknod(S_IFCHR) \| allow_mkchar \| +----------------------------------+----------------------------------+ \| sys_symlink() \| allow_symlink \| +----------------------------------+----------------------------------+ \| sys_mkdir() \| allow_mkdir \| +----------------------------------+----------------------------------+ \| sys_rmdir() \| allow_rmdir \| +----------------------------------+----------------------------------+ \| sys_link() \| allow_link \| +----------------------------------+----------------------------------+ \| sys_rename() \| allow_rename \| +----------------------------------+----------------------------------+ TOMOYO requires "allow_execute" permission of a pathname passed to do_execve() but does not require "allow_read" permission of that pathname. Let's consider 3 patterns (statically linked, dynamically linked, shell script). This description is to some degree simplified. $ cat hello.c #include <stdio.h> int main() { printf("Hello\n"); return 0; } $ cat hello.sh #! /bin/sh echo "Hello" $ gcc -static -o hello-static hello.c $ gcc -o hello-dynamic hello.c $ chmod 755 hello.sh Case 1 -- Executing hello-static from bash. (1) The bash process calls fork() and the child process requests do_execve("hello-static"). (2) The kernel checks "allow_execute hello-static" from "bash" domain. (3) The kernel calculates "bash hello-static" as the domain to transit to. (4) The kernel overwrites the child process by "hello-static". (5) The child process transits to "bash hello-static" domain. (6) The "hello-static" starts and finishes. Case 2 -- Executing hello-dynamic from bash. (1) The bash process calls fork() and the child process requests do_execve("hello-dynamic"). (2) The kernel checks "allow_execute hello-dynamic" from "bash" domain. (3) The kernel calculates "bash hello-dynamic" as the domain to transit to. (4) The kernel checks "allow_read ld-linux.so" from "bash hello-dynamic" domain. I think permission to access ld-linux.so should be charged hello-dynamic program, for "hello-dynamic needs ld-linux.so" is not a fault of bash program. (5) The kernel overwrites the child process by "hello-dynamic". (6) The child process transits to "bash hello-dynamic" domain. (7) The "hello-dynamic" starts and finishes. Case 3 -- Executing hello.sh from bash. (1) The bash process calls fork() and the child process requests do_execve("hello.sh"). (2) The kernel checks "allow_execute hello.sh" from "bash" domain. (3) The kernel calculates "bash hello.sh" as the domain to transit to. (4) The kernel checks "allow_read /bin/sh" from "bash hello.sh" domain. I think permission to access /bin/sh should be charged hello.sh program, for "hello.sh needs /bin/sh" is not a fault of bash program. (5) The kernel overwrites the child process by "/bin/sh". (6) The child process transits to "bash hello.sh" domain. (7) The "/bin/sh" requests open("hello.sh"). (8) The kernel checks "allow_read hello.sh" from "bash hello.sh" domain. (9) The "/bin/sh" starts and finishes. Whether a file is interpreted as a program or not depends on an application. The kernel cannot know whether the file is interpreted as a program or not. Thus, TOMOYO treats "hello-static" "hello-dynamic" "ld-linux.so" "hello.sh" "/bin/sh" equally as merely files; no distinction between executable and non-executable. Therefore, TOMOYO doesn't check DAC's execute permission. TOMOYO checks "allow_read" permission instead. Calling do_execve() is a bold gesture that an old program's instance (i.e. current process) is ready to be overwritten by a new program and is ready to transfer control to the new program. To split purview of programs, TOMOYO requires "allow_execute" permission of the new program against the old program's instance and performs domain transition. If do_execve() succeeds, the old program is no longer responsible against the consequence of the new program's behavior. Only the new program is responsible for all consequences. But TOMOYO doesn't require "allow_read" permission of the new program. If TOMOYO requires "allow_read" permission of the new program, TOMOYO will allow an attacker (who hijacked the old program's instance) to open the new program and steal data from the new program. Requiring "allow_read" permission will widen purview of the old program. Not requiring "allow_read" permission of the new program against the old program's instance is my design for reducing purview of the old program. To be able to know whether the current process is in do_execve() or not, I want to add in_execve flag to "task_struct". Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	26a2a1c9eb	Domain transition handler. This file controls domain creation/deletion/transition. Every process belongs to a domain in TOMOYO Linux. Domain transition occurs when execve(2) is called and the domain is expressed as 'process invocation history', such as '<kernel> /sbin/init /etc/init.d/rc'. Domain information is stored in current->cred->security field. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	b69a54ee58	File operation restriction part. This file controls file related operations of TOMOYO Linux. tomoyo/tomoyo.c calls the following six functions in this file. Each function handles the following access types. * tomoyo_check_file_perm sysctl()'s "read" and "write". * tomoyo_check_exec_perm "execute". * tomoyo_check_open_permission open(2) for "read" and "write". * tomoyo_check_1path_perm "create", "unlink", "mkdir", "rmdir", "mkfifo", "mksock", "mkblock", "mkchar", "truncate" and "symlink". * tomoyo_check_2path_perm "rename" and "unlink". * tomoyo_check_rewrite_permission "rewrite". ("rewrite" are operations which may lose already recorded data of a file, i.e. open(!O_APPEND) \|\| open(O_TRUNC) \|\| truncate() \|\| ftruncate()) The functions which actually checks ACLs are the following three functions. Each function handles the following access types. ACL directive is expressed by "allow_<access type>". * tomoyo_check_file_acl Open() operation and execve() operation. ("read", "write", "read/write" and "execute") * tomoyo_check_single_write_acl Directory modification operations with 1 pathname. ("create", "unlink", "mkdir", "rmdir", "mkfifo", "mksock", "mkblock", "mkchar", "truncate", "symlink" and "rewrite") * tomoyo_check_double_write_acl Directory modification operations with 2 pathname. ("link" and "rename") Also, this file contains handlers of some utility directives for file related operations. * "allow_read": specifies globally (for all domains) readable files. * "path_group": specifies pathname macro. * "deny_rewrite": restricts rewrite operation. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:05 +11:00
Kentaro Takeda	9590837b89	Common functions for TOMOYO Linux. This file contains common functions (e.g. policy I/O, pattern matching). -------------------- About pattern matching -------------------- Since TOMOYO Linux is a name based access control, TOMOYO Linux seriously considers "safe" string representation. TOMOYO Linux's string manipulation functions make reviewers feel crazy, but there are reasons why TOMOYO Linux needs its own string manipulation functions. ----- Part 1 : preconditions ----- People definitely want to use wild card. To support pattern matching, we have to support wild card characters. In a typical Linux system, filenames are likely consists of only alphabets, numbers, and some characters (e.g. + - ~ . / ). But theoretically, the Linux kernel accepts all characters but NUL character (which is used as a terminator of a string). Some Linux systems can have filenames which contain * ? ** etc. Therefore, we have to somehow modify string so that we can distinguish wild card characters and normal characters. It might be possible for some application's configuration files to restrict acceptable characters. It is impossible for kernel to restrict acceptable characters. We can't accept approaches which will cause troubles for applications. ----- Part 2 : commonly used approaches ----- Text formatted strings separated by space character (0x20) and new line character (0x0A) is more preferable for users over array of NUL-terminated string. Thus, people use text formatted configuration files separated by space character and new line. We sometimes need to handle non-printable characters. Thus, people use \ character (0x5C) as escape character and represent non-printable characters using octal or hexadecimal format. At this point, we remind (at least) 3 approaches. (1) Shell glob style expression (2) POSIX regular expression (UNIX style regular expression) (3) Maverick wild card expression On the surface, (1) and (2) sound good choices. But they have a big pitfall. All meta-characters in (1) and (2) are legal characters for representing a pathname, and users easily write incorrect expression. What is worse, users unlikely notice incorrect expressions because characters used for regular pathnames unlikely contain meta-characters. This incorrect use of meta-characters in pathname representation reveals vulnerability (e.g. unexpected results) only when irregular pathname is specified. The authors of TOMOYO Linux think that approaches which adds some character for interpreting meta-characters as normal characters (i.e. (1) and (2)) are not suitable for security use. Therefore, the authors of TOMOYO Linux propose (3). ----- Part 3: consideration points ----- We need to solve encoding problem. A single character can be represented in several ways using encodings. For Japanese language, there are "ShiftJIS", "ISO-2022-JP", "EUC-JP", "UTF-8" and more. Some languages (e.g. Japanese language) supports multi-byte characters (where a single character is represented using several bytes). Some multi-byte characters may match the escape character. For Japanese language, some characters in "ShiftJIS" encoding match \ character, and bothering Web's CGI developers. It is important that the kernel string is not bothered by encoding problem. Linus said, "I really would expect that kernel strings don't have an encoding. They're just C strings: a NUL-terminated stream of bytes." http://lkml.org/lkml/2007/11/6/142 Yes. The kernel strings are just C strings. We are talking about how to store and carry "kernel strings" safely. If we store "kernel string" into policy file as-is, the "kernel string" will be interpreted differently depending on application's encoding settings. One application may interpret "kernel string" as "UTF-8", another application may interpret "kernel string" as "ShiftJIS". Therefore, we propose to represent strings using ASCII encoding. In this way, we are no longer bothered by encoding problems. We need to avoid information loss caused by display. It is difficult to input and display non-printable characters, but we have to be able to handle such characters because the kernel string is a C string. If we use only ASCII printable characters (from 0x21 to 0x7E) and space character (0x20) and new line character (0x0A), it is easy to input from keyboard and display on all terminals which is running Linux. Therefore, we propose to represent strings using only characters which value is one of "from 0x21 to 0x7E", "0x20", "0x0A". We need to consider ease of splitting strings from a line. If we use an approach which uses "\ " for representing a space character within a string, we have to count the string from the beginning to check whether this space character is accompanied with \ character or not. As a result, we cannot monotonically split a line using space character. If we use an approach which uses "\040" for representing a space character within a string, we can monotonically split a line using space character. If we use an approach which uses NUL character as a delimiter, we cannot use string manipulation functions for splitting strings from a line. Therefore, we propose that we represent space character as "\040". We need to avoid wrong designations (incorrect use of special characters). Not all users can understand and utilize POSIX's regular expressions correctly and perfectly. If a character acts as a wild card by default, the user will get unexpected result if that user didn't know the meaning of that character. Therefore, we propose that all characters but \ character act as a normal character and let the user add \ character to make a character act as a wild card. In this way, users needn't to know all wild card characters beforehand. They can learn when they encountered an unseen wild card character for their first time. ----- Part 4: supported wild card expressions ----- At this point, we have wild card expressions listed below. +-----------+--------------------------------------------------------------+ \| Wild card \| Meaning and example \| +-----------+--------------------------------------------------------------+ \| \* \| More than or equals to 0 character other than '/'. \| \| \| /var/log/samba/\* \| +-----------+--------------------------------------------------------------+ \| \@ \| More than or equals to 0 character other than '/' or '.'. \| \| \| /var/www/html/\@.html \| +-----------+--------------------------------------------------------------+ \| \? \| 1 byte character other than '/'. \| \| \| /tmp/mail.\?\?\?\?\?\? \| +-----------+--------------------------------------------------------------+ \| \$ \| More than or equals to 1 decimal digit. \| \| \| /proc/\$/cmdline \| +-----------+--------------------------------------------------------------+ \| \+ \| 1 decimal digit. \| \| \| /var/tmp/my_work.\+ \| +-----------+--------------------------------------------------------------+ \| \X \| More than or equals to 1 hexadecimal digit. \| \| \| /var/tmp/my-work.\X \| +-----------+--------------------------------------------------------------+ \| \x \| 1 hexadecimal digit. \| \| \| /tmp/my-work.\x \| +-----------+--------------------------------------------------------------+ \| \A \| More than or equals to 1 alphabet character. \| \| \| /var/log/my-work/\$-\A-\$.log \| +-----------+--------------------------------------------------------------+ \| \a \| 1 alphabet character. \| \| \| /home/users/\a/\/public_html/\.html \| +-----------+--------------------------------------------------------------+ \| \- \| Pathname subtraction operator. \| \| \| +---------------------+------------------------------------+ \| \| \| \| Example \| Meaning \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /etc/\* \| All files in /etc/ directory. \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /etc/\\-\shadow\* \| /etc/\* other than /etc/\shadow\ \| \| \| \| +---------------------+------------------------------------+ \| \| \| \| /\\-proc\-sys/ \| /\/ other than /proc/ /sys/ \| \| \| \| +---------------------+------------------------------------+ \| +-----------+--------------------------------------------------------------+ +----------------+---------------------------------------------------------+ \| Representation \| Meaning and example \| +----------------+---------------------------------------------------------+ \| \\ \| backslash character itself. \| +----------------+---------------------------------------------------------+ \| \ooo \| 1 byte character. \| \| \| ooo is 001 <= ooo <= 040 \|\| 177 <= ooo <= 377. \| \| \| \| \| \| \040 for space character. \| \| \| \177 for del character. \| \| \| \| +----------------+---------------------------------------------------------+ ----- Part 5: Advantages ----- We can obtain extensibility. Since our proposed approach adds \ to a character to interpret as a wild card, we can introduce new wild card in future while maintaining backward compatibility. We can process monotonically. Since our proposed approach separates strings using a space character, we can split strings using existing string manipulation functions. We can reliably analyze access logs. It is guaranteed that a string doesn't contain space character (0x20) and new line character (0x0A). It is guaranteed that a string won't be converted by FTP and won't be damaged by a terminal's settings. It is guaranteed that a string won't be affected by encoding converters (except encodings which insert NUL character (e.g. UTF-16)). ----- Part 6: conclusion ----- TOMOYO Linux is using its own encoding with reasons described above. There is a disadvantage that we need to introduce a series of new string manipulation functions. But TOMOYO Linux's encoding is useful for all users (including audit and AppArmor) who want to perform pattern matching and safely exchange string information between the kernel and the userspace. -------------------- About policy interface -------------------- TOMOYO Linux creates the following files on securityfs (normally mounted on /sys/kernel/security) as interfaces between kernel and userspace. These files are for TOMOYO Linux management tools only, not for general programs. * profile * exception_policy * domain_policy * manager * meminfo * self_domain * version * .domain_status * .process_status /sys/kernel/security/tomoyo/profile This file is used to read or write profiles. "profile" means a running mode of process. A profile lists up functions and their modes in "$number-$variable=$value" format. The $number is profile number between 0 and 255. Each domain is assigned one profile. To assign profile to domains, use "ccs-setprofile" or "ccs-editpolicy" or "ccs-loadpolicy" commands. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/profile 0-COMMENT=-----Disabled Mode----- 0-MAC_FOR_FILE=disabled 0-MAX_ACCEPT_ENTRY=2048 0-TOMOYO_VERBOSE=disabled 1-COMMENT=-----Learning Mode----- 1-MAC_FOR_FILE=learning 1-MAX_ACCEPT_ENTRY=2048 1-TOMOYO_VERBOSE=disabled 2-COMMENT=-----Permissive Mode----- 2-MAC_FOR_FILE=permissive 2-MAX_ACCEPT_ENTRY=2048 2-TOMOYO_VERBOSE=enabled 3-COMMENT=-----Enforcing Mode----- 3-MAC_FOR_FILE=enforcing 3-MAX_ACCEPT_ENTRY=2048 3-TOMOYO_VERBOSE=enabled - MAC_FOR_FILE: Specifies access control level regarding file access requests. - MAX_ACCEPT_ENTRY: Limits the max number of ACL entries that are automatically appended during learning mode. Default is 2048. - TOMOYO_VERBOSE: Specifies whether to print domain policy violation messages or not. /sys/kernel/security/tomoyo/manager This file is used to read or append the list of programs or domains that can write to /sys/kernel/security/tomoyo interface. By default, only processes with both UID = 0 and EUID = 0 can modify policy via /sys/kernel/security/tomoyo interface. You can use keyword "manage_by_non_root" to allow policy modification by non root user. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/manager /usr/lib/ccs/loadpolicy /usr/lib/ccs/editpolicy /usr/lib/ccs/setlevel /usr/lib/ccs/setprofile /usr/lib/ccs/ld-watch /usr/lib/ccs/ccs-queryd /sys/kernel/security/tomoyo/exception_policy This file is used to read and write system global settings. Each line has a directive and operand pair. Directives are listed below. - initialize_domain: To initialize domain transition when specific program is executed, use initialize_domain directive. * initialize_domain "program" from "domain" * initialize_domain "program" from "the last program part of domain" * initialize_domain "program" If the part "from" and after is not given, the entry is applied to all domain. If the "domain" doesn't start with "<kernel>", the entry is applied to all domain whose domainname ends with "the last program part of domain". This directive is intended to aggregate domain transitions for daemon program and program that are invoked by the kernel on demand, by transiting to different domain. - keep_domain To prevent domain transition when program is executed from specific domain, use keep_domain directive. * keep_domain "program" from "domain" * keep_domain "program" from "the last program part of domain" * keep_domain "domain" * keep_domain "the last program part of domain" If the part "from" and before is not given, this entry is applied to all program. If the "domain" doesn't start with "<kernel>", the entry is applied to all domain whose domainname ends with "the last program part of domain". This directive is intended to reduce total number of domains and memory usage by suppressing unneeded domain transitions. To declare domain keepers, use keep_domain directive followed by domain definition. Any process that belongs to any domain declared with this directive, the process stays at the same domain unless any program registered with initialize_domain directive is executed. In order to control domain transition in detail, you can use no_keep_domain/no_initialize_domain keywrods. - alias: To allow executing programs using the name of symbolic links, use alias keyword followed by dereferenced pathname and reference pathname. For example, /sbin/pidof is a symbolic link to /sbin/killall5 . In normal case, if /sbin/pidof is executed, the domain is defined as if /sbin/killall5 is executed. By specifying "alias /sbin/killall5 /sbin/pidof", you can run /sbin/pidof in the domain for /sbin/pidof . (Example) alias /sbin/killall5 /sbin/pidof - allow_read: To grant unconditionally readable permissions, use allow_read keyword followed by canonicalized file. This keyword is intended to reduce size of domain policy by granting read access to library files such as GLIBC and locale files. Exception is, if ignore_global_allow_read keyword is given to a domain, entries specified by this keyword are ignored. (Example) allow_read /lib/libc-2.5.so - file_pattern: To declare pathname pattern, use file_pattern keyword followed by pathname pattern. The pathname pattern must be a canonicalized Pathname. This keyword is not applicable to neither granting execute permissions nor domain definitions. For example, canonicalized pathname that contains a process ID (i.e. /proc/PID/ files) needs to be grouped in order to make access control work well. (Example) file_pattern /proc/\$/cmdline - path_group To declare pathname group, use path_group keyword followed by name of the group and pathname pattern. For example, if you want to group all files under home directory, you can define path_group HOME-DIR-FILE /home/\/\ path_group HOME-DIR-FILE /home/\/\/\* path_group HOME-DIR-FILE /home/\/\/\/\ in the exception policy and use like allow_read @HOME-DIR-FILE to grant file access permission. - deny_rewrite: To deny overwriting already written contents of file (such as log files) by default, use deny_rewrite keyword followed by pathname pattern. Files whose pathname match the patterns are not permitted to open for writing without append mode or truncate unless the pathnames are explicitly granted using allow_rewrite keyword in domain policy. (Example) deny_rewrite /var/log/\* - aggregator To deal multiple programs as a single program, use aggregator keyword followed by name of original program and aggregated program. This keyword is intended to aggregate similar programs. For example, /usr/bin/tac and /bin/cat are similar. By specifying "aggregator /usr/bin/tac /bin/cat", you can run /usr/bin/tac in the domain for /bin/cat . For example, /usr/sbin/logrotate for Fedora Core 3 generates programs like /tmp/logrotate.\?\?\?\?\?\? and run them, but TOMOYO Linux doesn't allow using patterns for granting execute permission and defining domains. By specifying "aggregator /tmp/logrotate.\?\?\?\?\?\? /tmp/logrotate.tmp", you can run /tmp/logrotate.\?\?\?\?\?\? as if /tmp/logrotate.tmp is running. /sys/kernel/security/tomoyo/domain_policy This file contains definition of all domains and permissions that are granted to each domain. Lines from the next line to a domain definition ( any lines starting with "<kernel>") to the previous line to the next domain definitions are interpreted as access permissions for that domain. /sys/kernel/security/tomoyo/meminfo This file is to show the total RAM used to keep policy in the kernel by TOMOYO Linux in bytes. (Example) [root@tomoyo]# cat /sys/kernel/security/tomoyo/meminfo Shared: 61440 Private: 69632 Dynamic: 768 Total: 131840 You can set memory quota by writing to this file. (Example) [root@tomoyo]# echo Shared: 2097152 > /sys/kernel/security/tomoyo/meminfo [root@tomoyo]# echo Private: 2097152 > /sys/kernel/security/tomoyo/meminfo /sys/kernel/security/tomoyo/self_domain This file is to show the name of domain the caller process belongs to. (Example) [root@etch]# cat /sys/kernel/security/tomoyo/self_domain <kernel> /usr/sbin/sshd /bin/zsh /bin/cat /sys/kernel/security/tomoyo/version This file is used for getting TOMOYO Linux's version. (Example) [root@etch]# cat /sys/kernel/security/tomoyo/version 2.2.0-pre /sys/kernel/security/tomoyo/.domain_status This is a view (of a DBMS) that contains only profile number and domainnames of domain so that "ccs-setprofile" command can do line-oriented processing easily. /sys/kernel/security/tomoyo/.process_status This file is used by "ccs-ccstree" command to show "list of processes currently running" and "domains which each process belongs to" and "profile number which the domain is currently assigned" like "pstree" command. This file is writable by programs that aren't registered as policy manager. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:04 +11:00
Kentaro Takeda	c73bd6d473	Memory and pathname management functions. TOMOYO Linux performs pathname based access control. To remove factors that make pathname based access control difficult (e.g. symbolic links, "..", "//" etc.), TOMOYO Linux derives realpath of requested pathname from "struct dentry" and "struct vfsmount". The maximum length of string data is limited to 4000 including trailing '\0'. Since TOMOYO Linux uses '\ooo' style representation for non ASCII printable characters, maybe TOMOYO Linux should be able to support 16336 (which means (NAME_MAX * (PATH_MAX / (NAME_MAX + 1)) * 4 + (PATH_MAX / (NAME_MAX + 1))) including trailing '\0'), but I think 4000 is enough for practical use. TOMOYO uses only 0x21 - 0x7E (as printable characters) and 0x20 (as word delimiter) and 0x0A (as line delimiter). 0x01 - 0x20 and 0x80 - 0xFF is handled in \ooo style representation. The reason to use \ooo is to guarantee that "%s" won't damage logs. Userland program can request open("/tmp/file granted.\nAccess /tmp/file ", O_WRONLY \| O_CREAT, 0600) and logging such crazy pathname using "Access %s denied.\n" format will cause "fabrication of logs" like Access /tmp/file granted. Access /tmp/file denied. TOMOYO converts such characters to \ooo so that the logs will become Access /tmp/file\040granted.\012Access\040/tmp/file denied. and the administrator can read the logs safely using /bin/cat . Likewise, a crazy request like open("/tmp/\x01\x02\x03\x04\x05\x06\x07\x08\x09", O_WRONLY \| O_CREAT, 0600) will be processed safely by converting to Access /tmp/\001\002\003\004\005\006\007\010\011 denied. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:04 +11:00
Kentaro Takeda	f9ce1f1cda	Add in_execve flag into task_struct. This patch allows LSM modules to determine whether current process is in an execve operation or not so that they can behave differently while an execve operation is in progress. This patch is needed by TOMOYO. Please see another patch titled "LSM adapter functions." for backgrounds. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:03 +11:00
Linus Torvalds	b578f3fcca	Merge git://git.infradead.org/users/cbou/battery-2.6.29 * git://git.infradead.org/users/cbou/battery-2.6.29: pcf50633_charger: Fix typo	2009-02-11 16:28:08 -08:00
Johannes Weiner	1d93e52eb4	dmaengine: update kerneldoc Some of the kerneldoc comments in the dmaengine header describe already removed structure members. Remove them. Also add a short description for dma_device->device_is_tx_complete. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-02-11 17:12:27 -07:00
Takashi Iwai	26a74f1f61	ALSA: hda - Register (new) devices at reconfig The devices that have been newly added during reconfig must be registered. Otherwise they won't be visible to user-space. Signed-off-by: Takashi Iwai <tiwai@suse.de>	2009-02-12 00:13:19 +01:00
Takashi Iwai	32cf9a16f4	ALSA: mtpav - Fix initial value for input hwport Fix the initial value for input hwport. The old value (-1) may cause Oops when an realtime MIDI byte is received before the input port is explicitly given. Instead, now it's set to the broadcasting as default. Tested-by: Holger Dehnhardt <dehnhardt@ahdehnhardt.de> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2009-02-12 00:06:42 +01:00
Mimi Zohar	523979adfa	integrity: audit update Based on discussions on linux-audit, as per Steve Grubb's request http://lkml.org/lkml/2009/2/6/269, the following changes were made: - forced audit result to be either 0 or 1. - made template names const - Added new stand-alone message type: AUDIT_INTEGRITY_RULE Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 09:40:14 +11:00
Ian Dall	507e2fbaaa	w1: w1 temp calculation overflow fix Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12646 When the temperature exceeds 32767 milli-degrees the temperature overflows to -32768 millidegrees. These are bothe well within the -55 - +125 degree range for the sensor. Fix overflow in left-shift of a u8. Signed-off-by: Ian Dall <ian@beware.dropbear.id.au> Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:37 -08:00
Paul Clements	4d48a542b4	nbd: fix I/O hang on disconnected nbds Fix a problem that causes I/O to a disconnected (or partially initialized) nbd device to hang indefinitely. To reproduce: # ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048 # dd if=/dev/nbd23 of=/dev/null bs=4096 count=1 ...hangs... This can also occur when an nbd device loses its nbd-client/server connection. Although we clear the queue of any outstanding I/Os after the client/server connection fails, any additional I/Os that get queued later will hang. This bug may also be the problem reported in this bug report: http://bugzilla.kernel.org/show_bug.cgi?id=12277 Testing would need to be performed to determine if the two issues are the same. This problem was introduced by the new request handling thread code ("NBD: allow nbd to be used locally", 3/2008), which entered into mainline around 2.6.25. The fix, which is fairly simple, is to restore the check for lo->sock being NULL in do_nbd_request. This causes I/O to an uninitialized nbd to immediately fail with an I/O error, as it did prior to the introduction of this bug. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Reported-by: Jon Nelson <jnelson-kernel-bugzilla@jamponi.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:37 -08:00
Jeremy Fitzhardinge	9480c53e9b	mm: rearrange exit_mmap() to unlock before arch_exit_mmap Christophe Saout reported [in precursor to: http://marc.info/?l=linux-kernel&m=123209902707347&w=4]: > Note that I also some a different issue with CONFIG_UNEVICTABLE_LRU. > Seems like Xen tears down current->mm early on process termination, so > that __get_user_pages in exit_mmap causes nasty messages when the > process had any mlocked pages. (in fact, it somehow manages to get into > the swapping code and produces a null pointer dereference trying to get > a swap token) Jeremy explained: Yes. In the normal case under Xen, an in-use pagetable is "pinned", meaning that it is RO to the kernel, and all updates must go via hypercall (or writes are trapped and emulated, which is much the same thing). An unpinned pagetable is not currently in use by any process, and can be directly accessed as normal RW pages. As an optimisation at process exit time, we unpin the pagetable as early as possible (switching the process to init_mm), so that all the normal pagetable teardown can happen with direct memory accesses. This happens in exit_mmap() -> arch_exit_mmap(). The munlocking happens a few lines below. The obvious thing to do would be to move arch_exit_mmap() to below the munlock code, but I think we'd want to call it even if mm->mmap is NULL, just to be on the safe side. Thus, this patch: exit_mmap() needs to unlock any locked vmas before calling arch_exit_mmap, as the latter may switch the current mm to init_mm, which would cause the former to fail. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christophe Saout <christophe@saout.de> Cc: Keir Fraser <keir.fraser@eu.citrix.com> Cc: Christophe Saout <christophe@saout.de> Cc: Alex Williamson <alex.williamson@hp.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:37 -08:00
Jiri Slaby	3abdbf90a3	parport: parport_serial, don't bind netmos ibm 0299 Since netmos 9835 with subids 0x1014(IBM):0x0299 is now bound with serial/8250_pci, because it has no parallel ports and subdevice id isn't in the expected form, return -ENODEV from probe function. This is performed in netmos preinit_hook. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:37 -08:00
Federico Cuello	89e1219004	writeback: fix break condition Commit `dcf6a79dda` ("write-back: fix nr_to_write counter") fixed nr_to_write counter, but didn't set the break condition properly. If nr_to_write == 0 after being decremented it will loop one more time before setting done = 1 and breaking the loop. [akpm@linux-foundation.org: coding-style fixes] Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:37 -08:00
Heiko Carstens	6c5979631b	syscall define: fix uml compile bug With the new system call defines we get this on uml: arch/um/sys-i386/built-in.o: In function `sys_call_table': (.rodata+0x308): undefined reference to `sys_sigprocmask' Reason for this is that uml passes the preprocessor option -Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel. This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system call named sys_kernel_sigprocmask. However sys_sigprocmask is missing because of this. To avoid macro expansion for the system call name just concatenate the name at first define instead of carrying it through severel levels. This was pointed out by Al Viro. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Carsten Otte	0e4a9b5928	ext2/xip: refuse to change xip flag during remount with busy inodes For a reason that I was unable to understand in three months of debugging, mount ext2 -o remount stopped working properly when remounting from regular operation to xip, or the other way around. According to a git bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL rework in the vm: commit `70688e4dd1` Author: Nick Piggin <npiggin@suse.de> Date: Mon Apr 28 02:13:02 2008 -0700 xip: support non-struct page backed memory In the failing scenario, the filesystem is mounted read only via root= kernel parameter on s390x. During remount (in rc.sysinit), the inodes of the bash binary and its libraries are busy and cannot be invalidated (the bash which is running rc.sysinit resides on subject filesystem). Afterwards, another bash process (running ifup-eth) recurses into a subshell, runs dup_mm (via fork). Some of the mappings in this bash process were created from inodes that could not be invalidated during remount. Both parent and child process crash some time later due to inconsistencies in their address spaces. The issue seems to be timing sensitive, various attempts to recreate it have failed. This patch refuses to change the xip flag during remount in case some inodes cannot be invalidated. This patch keeps users from running into that issue. [akpm@linux-foundation.org: cleanup] Signed-off-by: Carsten Otte <cotte@de.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Li Zefan	cfebe563bd	cgroups: fix lockdep subclasses overflow I enabled all cgroup subsystems when compiling kernel, and then: # mount -t cgroup -o net_cls xxx /mnt # mkdir /mnt/0 This showed up immediately: BUG: MAX_LOCKDEP_SUBCLASSES too low! turning off the locking correctness validator. It's caused by the cgroup hierarchy lock: for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; if (ss->root == root) mutex_lock_nested(&ss->hierarchy_mutex, i); } Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but MAX_LOCKDEP_SUBCLASSES is 8. This patch uses different lockdep keys for different subsystems. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
KOSAKI Motohiro	01c4a42831	cgroups: add Li Zefan as a maintainer Add Li Zefan as co-maintainer. Acked-by: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Roel Kluin	c318c7ac49	rtc: t reaches -1, tested 0 With a postfix decrement t will reach -1 rather than 0, so neither the warning nor the `goto error_out' will occur. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Manuel Lauss <mano@roarinelk.homelinux.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Randy Dunlap	b4870bc5ee	kernel-doc: fix syscall wrapper processing Fix kernel-doc processing of SYSCALL wrappers. The SYSCALL wrapper patches played havoc with kernel-doc for syscalls. Syscalls that were scanned for DocBook processing reported warnings like this one, for sys_tgkill: Warning(kernel/signal.c:2285): No description found for parameter 'tgkill' Warning(kernel/signal.c:2285): No description found for parameter 'pid_t' Warning(kernel/signal.c:2285): No description found for parameter 'int' because the macro parameters all "look like" function parameters, although they are not: /** * sys_tgkill - send signal to one specific thread * @tgid: the thread group ID of the thread * @pid: the PID of the thread * @sig: signal to be sent * * This syscall also checks the @tgid and returns -ESRCH even if the PID * exists but it's not belonging to the target process anymore. This * method solves the problem of threads exiting and PIDs getting reused. / SYSCALL_DEFINE3(tgkill, pid_t, tgid, pid_t, pid, int, sig) { ... This patch special-cases the handling SYSCALL_DEFINE function prototypes by expanding them to long sys_foobar(type1 arg1, type1 arg2, ...) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Randy Dunlap	f40b45a2e4	kernel-doc: preferred ending marker and examples Fix kernel-doc-nano-HOWTO.txt to use / as the ending marker in kernel-doc examples and state that / is the preferred ending marker. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reported-by: Robert Love <robert.w.love@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
KAMEZAWA Hiroyuki	2e9c237243	memcg: use __GFP_NOWARN in page cgroup allocation page_cgroup's page allocation at init/memory hotplug uses kmalloc() and vmalloc(). If kmalloc() failes, vmalloc() is used. This is because vmalloc() is very limited resource on 32bit systems. We want to use kmalloc() first. But in this kind of call, __GFP_NOWARN should be specified. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
Uwe Kleine-Koenig	d4097456cd	video/framebuffer: move the probe func into .devinit.text in Blackfin LCD driver Signed-off-by: Uwe Kleine-Koenig <ukleinek@strlen.de> Signed-off-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
Marcel Selhorst	7dcce1334f	tpm: correct email address for tpm_infineon-driver Update my email address. Signed-off-by: Marcel Selhorst <m.selhorst@sirrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
MinChan Kim	508b9f8efd	mm: fix mlocked page counter mismatch When I tested following program, I found that the mlocked counter is strange. It cannot free some mlocked pages. It is because try_to_unmap_file() doesn't check real page mappings in vmas. That is because the goal of an address_space for a file is to find all processes into which the file's specific interval is mapped. It is related to the file's interval, not to pages. Even if the page isn't really mapped by the vma, it returns SWAP_MLOCK since the vma has VM_LOCKED, then calls try_to_mlock_page. After this the mlocked counter is increased again. COWed anon page in a file-backed vma could be a such case. This patch resolves it. -- my test program -- int main() { mlockall(MCL_CURRENT); return 0; } -- before -- root@barrios-target-linux:~# cat /proc/meminfo \| egrep 'Mlo\|Unev' Unevictable: 0 kB Mlocked: 0 kB -- after -- root@barrios-target-linux:~# cat /proc/meminfo \| egrep 'Mlo\|Unev' Unevictable: 8 kB Mlocked: 8 kB Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00

... 24 25 26 27 28 ...

132681 commits