The setup_rctl call was making a call into the ring structure after it had
been freed. This was causing a panic on shutdown. This call wasn't
necessary since it is possible to get the needed index from
adapter->vfs_allocated_count.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the removal of duplicate unpack_to_rootfs() (commit
df52092f3c) the messages displayed do not
actually correspond to what the kernel is doing. In addition, depending
if ramdisks are supported or not, the messages are not at all the same.
So keep the messages more in sync with what is really doing the kernel,
and only display a second message in case of failure. This also ensure
that the printk message cannot be split by other printk's.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NOMMU mmap() has an option controlled by a sysctl variable that determines
whether the allocations made by do_mmap_private() should have the excess
space trimmed off and returned to the allocator. Make the initial setting
of this variable a Kconfig configuration option.
The reason there can be excess space is that the allocator only allocates
in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
power of 2.
There are two alternatives:
(1) Keep the excess as dead space. The dead space then remains unused for the
lifetime of the mapping. Mappings of shared objects such as libc, ld.so
or busybox's text segment may retain their dead space forever.
(2) Return the excess to the allocator. This means that the dead space is
limited to less than a page per mapping, but it means that for a transient
process, there's more chance of fragmentation as the excess space may be
reused fairly quickly.
During the boot process, a lot of transient processes are created, and
this can cause a lot of fragmentation as the pagecache and various slabs
grow greatly during this time.
By turning off the trimming of excess space during boot and disabling
batching of frees, Coldfire can manage to boot.
A better way of doing things might be to have /sbin/init turn this option
off. By that point libc, ld.so and init - which are all long-duration
processes - have all been loaded and trimmed.
Reported-by: Lanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Lanttor Guo <lanttor.guo@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clamp zone_batchsize() to 0 under NOMMU conditions to stop
free_hot_cold_page() from queueing and batching frees.
The problem is that under NOMMU conditions it is really important to be
able to allocate large contiguous chunks of memory, but when munmap() or
exit_mmap() releases big stretches of memory, return of these to the buddy
allocator can be deferred, and when it does finally happen, it can be in
small chunks.
Whilst the fragmentation this incurs isn't so much of a problem under MMU
conditions as userspace VM is glued together from individual pages with
the aid of the MMU, it is a real problem if there isn't an MMU.
By clamping the page freeing queue size to 0, pages are returned to the
allocator immediately, and the buddy detector is more likely to be able to
glue them together into large chunks immediately, and fragmentation is
less likely to occur.
By disabling batching of frees, and by turning off the trimming of excess
space during boot, Coldfire can manage to boot.
Reported-by: Lanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Lanttor Guo <lanttor.guo@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use roundown_pow_of_two(N) in zone_batchsize() rather than (1 <<
(fls(N)-1)) as they are equivalent, and with the former it is easier to
see what is going on.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Lanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The isl29003 does not interpret the return value of
i2c_smbus_write_byte_data() correctly and hence causes an error on system
resume.
Also introduce power_state_before_suspend and restore the chip's power
state upon wakeup.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cyblafb driver is removed so remove its last trace in the makefile.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.
Signed-off-by: Ralph Wuerthner <ralphw@linux.vnet.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change last "i386" to X86-32 as is used throughout the rest of the file.
Change combination of X86-32,X86-64 to just X86, as is done throughout the
rest of the file.
Add a note that hyphens and underscores are equivalent in parameter names,
with examples.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Christopher Sylvain <chris.sylvain@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The software fillrect routines do not work properly when the number of
pixels per machine word is not an integer. To see that, run the following
command on a fbdev console with a 24bpp video mode, using a
non-accelerated driver such as (u)vesafb:
reset ; echo -e '\e[41mtest\e[K'
The expected result is 'test' displayed on a line with red background.
Instead of that, 'test' has a red background, but the rest of the line
(rendered using fillrect()) contains a distored colorful pattern.
This patch fixes the problem by correctly computing rotation shifts. It
has been tested in a 24bpp mode on 32- and 64-bit little-endian machines.
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When /proc/sys/vm/oom_kill_allocating_task is set for large systems that
want to avoid the lengthy tasklist scan, it's possible to livelock if
current is ineligible for oom kill. This normally happens when it is set
to OOM_DISABLE, but is also possible if any threads are sharing the same
->mm with a different tgid.
So change __out_of_memory() to fall back to the full task-list scan if it
was unable to kill `current'.
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a problem where the generic block based fiemap stuff would not
properly set FIEMAP_EXTENT_LAST on the last extent. I've reworked things
to keep track if we go past the EOF, and mark the last extent properly.
The problem was reported by and tested by Eric Sandeen.
Tested-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-btrfs@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <Joel.Becker@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building with gcc 3.2 I get thousands of warnings such as
include/linux/gfp.h: In function `allocflags_to_migratetype':
include/linux/gfp.h:105: warning: null format string
due to passing a NULL format string to warn_slowpath() in
#define __WARN() warn_slowpath(__FILE__, __LINE__, NULL)
Split this case out into a separate call. This also shrinks the kernel
slightly:
text data bss dec hex filename
4802274 707668 712704 6222646 5ef336 vmlinux
text data bss dec hex filename
4799027 703572 712704 6215303 5ed687 vmlinux
due to removeing one argument from the commonly-called __WARN().
[akpm@linux-foundation.org: reduce scope of `empty']
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel boot parameter `hashdist' now defaults on for all 64bit NUMA.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is what we believe to be a false positive reported by lockdep.
inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
The plan is to fix this via lockdep annotation, but that is proving to be
quite involved.
The patch flips the allocation over to GFP_NFS to shut the warning up, for
the 2.6.30 release.
Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm
to switch it back to GFP_KERNEL so we don't forget.
=================================
[ INFO: inconsistent lock state ]
2.6.30-rc2-next-20090417 #203
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
{RECLAIM_FS-ON-W} state was registered at:
[<ffffffff81079188>] mark_held_locks+0x68/0x90
[<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
[<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
[<ffffffff81130652>] kernel_event+0xe2/0x190
[<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
[<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
[<ffffffff8110444d>] vfs_create+0xcd/0x140
[<ffffffff8110825d>] do_filp_open+0x88d/0xa20
[<ffffffff810f6b68>] do_sys_open+0x98/0x140
[<ffffffff810f6c50>] sys_open+0x20/0x30
[<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 690455
hardirqs last enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
softirqs last enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
other info that might help us debug this:
2 locks held by kswapd0/380:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
#1: (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
stack backtrace:
Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
Call Trace:
[<ffffffff810789ef>] print_usage_bug+0x19f/0x200
[<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
[<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
[<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
[<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
[<ffffffff810f478c>] ? slob_free+0x10c/0x370
[<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff81012fe9>] ? sched_clock+0x9/0x10
[<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
[<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
[<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
[<ffffffff8110cb23>] d_kill+0x33/0x60
[<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
[<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
[<ffffffff810d0cc5>] shrink_slab+0x125/0x180
[<ffffffff810d1540>] kswapd+0x560/0x7a0
[<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
[<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
[<ffffffff8106555b>] kthread+0x5b/0xa0
[<ffffffff8100d40a>] child_rip+0xa/0x20
[<ffffffff8100cdd0>] ? restore_args+0x0/0x30
[<ffffffff81065500>] ? kthread+0x0/0xa0
[<ffffffff8100d400>] ? child_rip+0x0/0x20
[eparis@redhat.com: fix audit too]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ring buffer benchmark/test runs a producer for 10 seconds.
This is done with preemption and interrupts enabled. But if the kernel
is not compiled with CONFIG_PREEMPT, it basically stops everything
but interrupts for 10 seconds.
Although this is just a test and is not for production, this attribute
can be quite annoying. It can also spawn badness elsewhere.
This patch solves the issues by calling "cond_resched" when the system
is not compiled with CONFIG_PREEMPT. It also keeps track of the time
spent to call cond_resched such that it does not go against the
time calculations. That is, if the task schedules away, the time scheduled
out is removed from the test data. Note, this only works for non PREEMPT
because we do not know when the task is scheduled out if we have PREEMPT
enabled.
[ Impact: prevent test from stopping the world for 10 seconds ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch removes bd_lock spinlock (inside jsm_board structure).
The lock is initialized in the probe function and not used anymore.
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is one area where we can't just magic away the bizarre use of
CLOCK_TICK_RATE as it leaks to user space APIs. It also means the visible
CLOCK_TICK_RATE is frozen for architectures which is horrible.
We need to fix this somehow
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If lockd is signalled soon enough after restart then locks_start_grace()
will try to re-add an entry to a list and trigger a lock corruption
warning.
Thanks to Wang Chen for the problem report and diagnosis.
WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
...
list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128).
...
Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3
Call Trace:
[<c042d5b5>] warn_slowpath+0x71/0xa0
[<c0422a96>] ? update_curr+0x11d/0x125
[<c044b12d>] ? trace_hardirqs_on_caller+0x18/0x150
[<c044b270>] ? trace_hardirqs_on+0xb/0xd
[<c051c61a>] ? _raw_spin_lock+0x53/0xfa
[<c051c89f>] __list_add+0x27/0x5c
[<ef8f6daa>] locks_start_grace+0x22/0x30 [lockd]
[<ef8f34da>] set_grace_period+0x39/0x53 [lockd]
[<c06b8921>] ? lock_kernel+0x1c/0x28
[<ef8f3558>] lockd+0x64/0x164 [lockd]
[<c044b12d>] ? trace_hardirqs_on_caller+0x18/0x150
[<c04227b0>] ? complete+0x34/0x3e
[<ef8f34f4>] ? lockd+0x0/0x164 [lockd]
[<ef8f34f4>] ? lockd+0x0/0x164 [lockd]
[<c043dd42>] kthread+0x45/0x6b
[<c043dcfd>] ? kthread+0x0/0x6b
[<c0403c23>] kernel_thread_helper+0x7/0x10
Reported-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: stable@kernel.org
When a new wimax_dev is created, it's state has to be __WIMAX_ST_NULL
until wimax_dev_add() is succesfully called. This allows calls into
the stack that happen before said time to be rejected.
Until now, the state was being set (by mistake) to UNINITIALIZED,
which was allowing calls such as wimax_report_rfkill_hw() to go
through even when a call to wimax_dev_add() had failed; that was
causing an oops when touching uninitialized data.
This situation is normal when the device starts reporting state before
the whole initialization has been completed. It just has to be dealt
with.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
When sending a message to user space using wimax_msg(), if nla_put()
fails, correctly interpret the return code from wimax_msg_alloc() as
an err ptr and return the error code instead of crashing (as it is
assuming than non-NULL means the pointer is ok).
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
After 2f9092e102 "Fix i_mutex vs. readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry. Check for this case to avoid a NULL dereference.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Cc: stable@kernel.org
Ingo Molnar thought the code would be cleaner if we used a function call
instead of a goto for moving the tail page. After implementing this,
it seems that gcc still inlines the result and the output is pretty much
the same. Since this is considered a cleaner approach, might as well
implement it.
[ Impact: code clean up ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The result of the allocation of the ring buffer read page in the
ring buffer bench mark does not check the return to see if a page
was actually allocated. This patch fixes that.
[ Impact: avoid NULL dereference ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The code in __rb_reserve_next checks on page overflow if it is the
original commiter and then resets the page back to the original
setting. Although this is fine, and the code is correct, it is
a bit fragil. Some experimental work I did breaks it easily.
The better and more robust solution is to have all commiters that
overflow the page, simply subtract what they added.
[ Impact: more robust ring buffer account management ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use -I$(src) to add the current directory the include path.
[ Impact: cleanup ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This compiler warning:
CC kernel/trace/trace_output.o
kernel/trace/trace_output.c: In function ‘register_ftrace_event’:
kernel/trace/trace_output.c:544: warning: ‘list’ may be used uninitialized in this function
Is wrong as 'list' is always initialized - but GCC (4.3.2) does not
recognize this relationship properly.
Work around the warning by initializing the variable to NULL.
[ Impact: fix false positive compiler warning ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This attempts to clarify names utilized during block I/O remap
operations (partition, volume manager). It correctly matches up the
/from/ information for both device & sector. This takes in the concept
from Kosaki Motohiro and extends it to include better naming for the
"device_from" field.
[ Impact: cleanup ]
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49FF4FAE.3000301@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The orig_cpu parameter in trace_sched_migrate_task() is not necessary,
it can be got by using task_cpu(p) in the probe.
[ Impact: micro-optimization ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
[ modified from Mathieu's patch. The original patch is at:
http://marc.info/?l=linux-kernel&m=123791201716239&w=2 ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: zhaolei@cn.fujitsu.com
Cc: laijs@cn.fujitsu.com
LKML-Reference: <49FFFDB7.1050402@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The mem= option will truncate the memory map at a specified address so
it's not possible to register nodes with memory beyond the e820 upper
bound.
unparse_node() is only called when then node had memory associated with
it, although with the mem= option it is no longer addressable.
[ Impact: fix boot hang on certain (large) systems ]
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <alpine.DEB.2.00.0905051248150.20021@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A module will add/remove its trace events when it gets loaded/unloaded, so
the ftrace_events list is not "const", and concurrent access needs to be
protected.
This patch thus fixes races between loading/unloding modules and read
'available_events' or read/write 'set_event', etc.
Below shows how to reproduce the race:
# for ((; ;)) { cat /mnt/tracing/available_events; } > /dev/null &
# for ((; ;)) { insmod trace-events-sample.ko; rmmod sample; } &
After a while:
BUG: unable to handle kernel paging request at 0010011c
IP: [<c1080f27>] t_next+0x1b/0x2d
...
Call Trace:
[<c10c90e6>] ? seq_read+0x217/0x30d
[<c10c8ecf>] ? seq_read+0x0/0x30d
[<c10b4c19>] ? vfs_read+0x8f/0x136
[<c10b4fc3>] ? sys_read+0x40/0x65
[<c1002a68>] ? sysenter_do_call+0x12/0x36
[ Impact: fix races when concurrent accessing ftrace_events list ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A00F709.3080800@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When unloading a module, memory allocated by init_preds() and
trace_define_field() is not freed.
[ Impact: fix memory leak ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4A00F6E0.3040503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Normally a config should be default to n. This patch also makes the
sample module-only, like SAMPLE_MARKERS and SAMPLE_TRACEPOINTS.
[ Impact: don't build trace event sample by default ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A00F6C0.8090803@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The sample is useful for testing, and I'm using it. But after
loading the module, it keeps saying hi every 10 seconds, this may
be disturbing.
Also Steven said commenting out the "hi" helped in causing races. :)
[ Impact: make testing a bit easier ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A00F6AD.2070008@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On mount, "sec=ntlmssp" can now be specified to allow
"rawntlmssp" security to be enabled during
CIFS session establishment/authentication (ntlmssp used to
require specifying krb5 which was counterintuitive).
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch adds code that can benchmark the ring buffer as well as
test it. This code can be compiled into the kernel (not recommended)
or as a module.
A separate ring buffer is used to not interfer with other users, like
ftrace. It creates a producer and a consumer (option to disable creation
of the consumer) and will run for 10 seconds, then sleep for 10 seconds
and then repeat.
While running, the producer will write 10 byte loads into the ring
buffer with just putting in the current CPU number. The reader will
continually try to read the buffer. The reader will alternate from reading
the buffer via event by event, or by full pages.
The output is a pr_info, thus it will fill up the syslogs.
Starting ring buffer hammer
End ring buffer hammer
Time: 9000349 (usecs)
Overruns: 12578640
Read: 5358440 (by events)
Entries: 0
Total: 17937080
Missed: 0
Hit: 17937080
Entries per millisec: 1993
501 ns per entry
Sleeping for 10 secs
Starting ring buffer hammer
End ring buffer hammer
Time: 9936350 (usecs)
Overruns: 0
Read: 28146644 (by pages)
Entries: 74
Total: 28146718
Missed: 0
Hit: 28146718
Entries per millisec: 2832
353 ns per entry
Sleeping for 10 secs
Time: is the time the test ran
Overruns: the number of events that were overwritten and not read
Read: the number of events read (either by pages or events)
Entries: the number of entries left in the buffer
(the by pages will only read full pages)
Total: Entries + Read + Overruns
Missed: the number of entries that failed to write
Hit: the number of entries that were written
The above example shows that it takes ~353 nanosecs per entry when
there is a reader, reading by pages (and no overruns)
The event by event reader slowed the producer down to 501 nanosecs.
[ Impact: see how changes to the ring buffer affect stability and performance ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In the hot path of the ring buffer "__rb_reserve_next" there's a big
if statement that does not even return back to the work flow.
code;
if (cross to next page) {
[ lots of code ]
return;
}
more code;
The condition is even the unlikely path, although we do not denote it
with an unlikely because gcc is fine with it. The condition is true when
the write crosses a page boundary, and we need to start at a new page.
Having this if statement makes it hard to read, but calling another
function to do the work is also not appropriate, because we are using a lot
of variables that were set before the if statement, and we do not want to
send them as parameters.
This patch changes it to a goto:
code;
if (cross to next page)
goto next_page;
more code;
return;
next_page:
[ lots of code]
This makes the code easier to understand, and a bit more obvious.
The output from gcc is practically identical. For some reason, gcc decided
to use different registers when I switched it to a goto. But other than that,
the logic is the same.
[ Impact: easier to read code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We were not setting the SMB uid in NTLMSSP authenticate
request which could lead to INVALID_PARAMETER error
on 2nd session setup.
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/r128: fix r128 ioremaps to use ioremap_wc.
drm: cleanup properly in drm_get_dev() failure paths
drm: clean the map list before destroying the hash table
drm: remove unreachable code in drm_sysfs.c
drm: add control node checks missing from kms merge
drm/kms: don't try to shortcut drm mode set function
drm/radeon: bump minor version for occlusion queries support
When adding the EXPORT_SYMBOL to some of the tracing API, I accidently
used EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL. This patch fixes
that mistake.
[ Impact: export the tracing code only for GPL modules ]
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The only references in the kernel to the .text.sched section are in
recordmcount.pl. Since the code it has is intended to be example code
it should refer to real kernel sections. So change it to .sched.text
instead.
[ Impact: consistency in comments ]
Signed-off-by: Tim Abbott <tabbott@mit.edu>
LKML-Reference: <1241136371-10768-1-git-send-email-tabbott@mit.edu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
backed by a file. The assumption is made that the page required is
order-0 and "normal" page cache.
On hugetlbfs, this assumption is not true and order-0 pages are
allocated and inserted into the hugetlbfs page cache. This leaks
hugetlbfs page reservations and can cause BUGs to trigger related to
corrupted page tables.
This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
regions.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a precaution, it is best to disable writing to the ring buffers
when reseting them.
[ Impact: prevent weird things if write happens during reset ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>