linux-pinenote

Author	SHA1	Message	Date
Frederic Weisbecker	1268fbc746	nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() Those two APIs were provided to optimize the calls of tick_nohz_idle_enter() and rcu_idle_enter() into a single irq disabled section. This way no interrupt happening in-between would needlessly process any RCU job. Now we are talking about an optimization for which benefits have yet to be measured. Let's start simple and completely decouple idle rcu and dyntick idle logics to simplify. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-12-11 10:31:57 -08:00
Paul E. McKenney	c4f3060843	sched: Add is_idle_task() to handle invalidated uses of idle_cpu() Commit `908a3283` (Fix idle_cpu()) invalidated some uses of idle_cpu(), which used to say whether or not the CPU was running the idle task, but now instead says whether or not the CPU is running the idle task in the absence of pending wakeups. Although this new implementation gives a better answer to the question "is this CPU idle?", it also invalidates other uses that were made of idle_cpu(). This commit therefore introduces a new is_idle_task() API member that determines whether or not the specified task is one of the idle tasks, allowing open-coded "->pid == 0" sequences to be replaced by something more meaningful. Suggested-by: Josh Triplett <josh@joshtriplett.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-12-11 10:31:47 -08:00
Paul E. McKenney	4145fa7fbe	rcu: Deconfuse dynticks entry-exit tracing The trace_rcu_dyntick() trace event did not print both the old and the new value of the nesting level, and furthermore printed only the low-order 32 bits of it. This could result in some confusion when interpreting trace-event dumps, so this commit prints both the old and the new value, prints the full 64 bits, and also selects the process-entry/exit increment to print nicely in hexadecimal. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:42 -08:00
Paul E. McKenney	0c53dd8b31	rcu: Introduce raw SRCU read-side primitives The RCU implementations, including SRCU, are designed to be used in a lock-like fashion, so that the read-side lock and unlock primitives must execute in the same context for any given read-side critical section. This constraint is enforced by lockdep-RCU. However, there is a need to enter an SRCU read-side critical section within the context of an exception and then exit in the context of the task that encountered the exception. The cost of this capability is that the read-side operations incur the overhead of disabling interrupts. Note that although the current implementation allows a given read-side critical section to be entered by one task and then exited by another, all known possible implementations that allow this have scalability problems. Therefore, a given read-side critical section must be exited by the same task that entered it, though perhaps from an interrupt or exception handler running within that task's context. But if you are thinking in terms of interrupt handlers, make sure that you have considered the possibility of threaded interrupt handlers. Credit goes to Peter Zijlstra for suggesting use of the existing _raw suffix to indicate disabling lockdep over the earlier "bulkref" names. Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2011-12-11 10:31:40 -08:00
Frederic Weisbecker	2bbb6817c0	nohz: Allow rcu extended quiescent state handling seperately from tick stop It is assumed that rcu won't be used once we switch to tickless mode and until we restart the tick. However this is not always true, as in x86-64 where we dereference the idle notifiers after the tick is stopped. To prepare for fixing this, add two new APIs: tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu(). If no use of RCU is made in the idle loop between tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch must instead call the new *_norcu() version such that the arch doesn't need to call rcu_idle_enter() and rcu_idle_exit(). Otherwise the arch must call tick_nohz_enter_idle() and tick_nohz_exit_idle() and also call explicitly: - rcu_idle_enter() after its last use of RCU before the CPU is put to sleep. - rcu_idle_exit() before the first use of RCU after the CPU is woken up. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-12-11 10:31:36 -08:00
Frederic Weisbecker	280f06774a	nohz: Separate out irq exit and idle loop dyntick logic The tick_nohz_stop_sched_tick() function, which tries to delay the next timer tick as long as possible, can be called from two places: - From the idle loop to start the dytick idle mode - From interrupt exit if we have interrupted the dyntick idle mode, so that we reprogram the next tick event in case the irq changed some internal state that requires this action. There are only few minor differences between both that are handled by that function, driven by the ts->inidle cpu variable and the inidle parameter. The whole guarantees that we only update the dyntick mode on irq exit if we actually interrupted the dyntick idle mode, and that we enter in RCU extended quiescent state from idle loop entry only. Split this function into: - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters dynticks idle mode unconditionally if it can, and enters into RCU extended quiescent state. - tick_nohz_irq_exit() which only updates the dynticks idle mode when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called). To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed into tick_nohz_idle_exit(). This simplifies the code and micro-optimize the irq exit path (no need for local_irq_save there). This also prepares for the split between dynticks and rcu extended quiescent state logics. We'll need this split to further fix illegal uses of RCU in extended quiescent states in the idle loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:35 -08:00
Paul E. McKenney	867f236bd1	rcu: Make srcu_read_lock_held() call common lockdep-enabled function A common debug_lockdep_rcu_enabled() function is used to check whether RCU lockdep splats should be reported, but srcu_read_lock() does not use it. This commit therefore brings srcu_read_lock_held() up to date. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:34 -08:00
Paul E. McKenney	ff195cb69b	rcu: Warn when srcu_read_lock() is used in an extended quiescent state Catch SRCU up to the other variants of RCU by making PROVE_RCU complain if either srcu_read_lock() or srcu_read_lock_held() are used from within RCU-idle mode. Frederic reworked this to allow for the new versions of his patches that check for extended quiescent states. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:33 -08:00
Paul E. McKenney	d8ab29f8be	rcu: Remove one layer of abstraction from PROVE_RCU checking Simplify things a bit by substituting the definitions of the single-line rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(), rcu_read_release_bh(), rcu_read_acquire_sched(), and rcu_read_release_sched() functions at their call points. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:32 -08:00
Frederic Weisbecker	00f49e5729	rcu: Warn when rcu_read_lock() is used in extended quiescent state We are currently able to detect uses of rcu_dereference_check() inside extended quiescent states (such as the RCU-free window in idle). But rcu_read_lock() and friends can be used without rcu_dereference(), so that the earlier commit checking for use of rcu_dereference() and friends while in RCU idle mode miss some error conditions. This commit therefore adds extended quiescent state checking to rcu_read_lock() and friends. Uses of RCU from within RCU-idle mode are totally ignored by RCU, hence the importance of these checks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:31 -08:00
Frederic Weisbecker	e6b80a3b09	rcu: Detect illegal rcu dereference in extended quiescent state Report that none of the rcu read lock maps are held while in an RCU extended quiescent state (the section between rcu_idle_enter() and rcu_idle_exit()). This helps detect any use of rcu_dereference() and friends from within the section in idle where RCU is not allowed. This way we can guarantee an extended quiescent window where the CPU can be put in dyntick idle mode or can simply aoid to be part of any global grace period completion while in the idle loop. Uses of RCU from such mode are totally ignored by RCU, hence the importance of these checks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:30 -08:00
Paul E. McKenney	91afaf3002	rcu: Add failure tracing to rcutorture Trace the rcutorture RCU accesses and dump the trace buffer when the first failure is detected. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:26 -08:00
Paul E. McKenney	9b2e4f1880	rcu: Track idleness independent of idle tasks Earlier versions of RCU used the scheduling-clock tick to detect idleness by checking for the idle task, but handled idleness differently for CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side critical sections in the idle task, for example, for tracing. A more fine-grained detection of idleness is therefore required. This commit presses the old dyntick-idle code into full-time service, so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is always invoked at the beginning of an idle loop iteration. Similarly, rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked at the end of an idle-loop iteration. This allows the idle task to use RCU everywhere except between consecutive rcu_idle_enter() and rcu_idle_exit() calls, in turn allowing architecture maintainers to specify exactly where in the idle loop that RCU may be used. Because some of the userspace upcall uses can result in what looks to RCU like half of an interrupt, it is not possible to expect that the irq_enter() and irq_exit() hooks will give exact counts. This patch therefore expands the ->dynticks_nesting counter to 64 bits and uses two separate bitfields to count process/idle transitions and interrupt entry/exit transitions. It is presumed that userspace upcalls do not happen in the idle loop or from usermode execution (though usermode might do a system call that results in an upcall). The counter is hard-reset on each process/idle transition, which avoids the interrupt entry/exit error from accumulating. Overflow is avoided by the 64-bitness of the ->dyntick_nesting counter. This commit also adds warnings if a non-idle task asks RCU to enter idle state (and these checks will need some adjustment before applying Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246). In addition, validation of ->dynticks and ->dynticks_nesting is added. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:24 -08:00
Guennadi Liakhovetski	4c0b036db8	[media] V4L: soc-camera: fix compiler warnings on 64-bit platforms On 64-bit platforms assigning a pointer to a 32-bit variable causes a compiler warning and cannot actually work. Soc-camera currently doesn't support any 64-bit systems, but such platforms can be added in the and in any case compiler warnings should be avoided. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-12-11 09:34:00 -02:00
Stefan Nilsson XK	6de5fc9cf7	mmc: core: Add quirk for long data read time Adds a quirk that sets the data read timeout to a fixed value instead of relying on the information in the CSD. The timeout value chosen is 300ms since that has proven enough for the problematic cards found, but could be increased if other cards require this. This patch also enables this quirk for certain Micron cards known to have this problem. Signed-off-by: Stefan Nilsson XK <stefan.xk.nilsson@stericsson.com> Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Cc: <stable@kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2011-12-10 16:18:35 -05:00
Linus Torvalds	53523d5263	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: use new generic {enable,disable}_percpu_irq() routines drivers/net/ethernet/tile: use skb_frag_page() API asm-generic/unistd.h: support new process_vm_{readv,write} syscalls arch/tile: fix double-free bug in homecache_free_pages() arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.	2011-12-09 08:08:57 -08:00
Konstantin Khlebnikov	83aeeada7c	vmscan: use atomic-long for shrinker batching Use atomic-long operations instead of looping around cmpxchg(). [akpm@linux-foundation.org: massage atomic.h inclusions] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-09 07:50:27 -08:00
Tejun Heo	7bd0b0f0da	memblock: Reimplement memblock allocation using reverse free area iterator Now that all early memory information is in memblock when enabled, we can implement reverse free area iterator and use it to implement NUMA aware allocator which is then wrapped for simpler variants instead of the confusing and inefficient mending of information in separate NUMA aware allocator. Implement for_each_free_mem_range_reverse(), use it to reimplement memblock_find_in_range_node() which in turn is used by all allocators. The visible allocator interface is inconsistent and can probably use some cleanup too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:09 -08:00
Tejun Heo	0ee332c145	memblock: Kill early_node_map[] Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP - there's no user of early_node_map[] left. Kill early_node_map[] and replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also, relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h as page_alloc.c would no longer host an alternative implementation. This change is ultimately one to one mapping and shouldn't cause any observable difference; however, after the recent changes, there are some functions which now would fit memblock.c better than page_alloc.c and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK doesn't make much sense on some of them. Further cleanups for functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice. -v2: Fix compile bug introduced by mis-spelling CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in mmzone.h. Reported by Stephen Rothwell. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:09 -08:00
Tejun Heo	7fb0bc3f06	memblock: Implement memblock_add_node() Implement memblock_add_node() which can add a new memblock memory region with specific node ID. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:08 -08:00
Tejun Heo	1aadc0560f	memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:08 -08:00
Tejun Heo	1440c4e2c9	memblock: Track total size of regions automatically Total size of memory regions was calculated by memblock_analyze() requiring explicitly calling the function between operations which can change memory regions and possible users of total size, which is cumbersome and fragile. This patch makes each memblock_type track total size automatically with minor modifications to memblock manipulation functions and remove requirements on calling memblock_analyze(). [__]memblock_dump_all() now also dumps the total size of reserved regions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:08 -08:00
Tejun Heo	fe091c208a	memblock: Kill memblock_init() memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:07 -08:00
Tejun Heo	c5a1cb284b	memblock: Kill sentinel entries at the end of static region arrays memblock no longer depends on having one more entry at the end during addition making the sentinel entries at the end of region arrays not too useful. Remove the sentinels. This eases further updates. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:07 -08:00
Tejun Heo	4ff7b82f1e	memblock: Add __memblock_dump_all() Add __memblock_dump_all() which dumps memblock configuration whether memblock_debug is enabled or not. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:06 -08:00
Tejun Heo	581adcbe12	memblock: Make memblock_{add\|remove\|free\|reserve}() return int and update prototypes memblock_{add\|remove\|free\|reserve}() return either 0 or -errno but had long as return type. Chage it to int. Also, drop 'extern' from all prototypes in memblock.h - they are unnecessary and used inconsistently (especially if mm.h is included in the picture). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:06 -08:00
Linus Torvalds	34a9d2c39a	Merge branch '3.2-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending * '3.2-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits) iscsi-target: Fix hex2bin warn_unused compile message target: Don't return an error if disabling unsupported features target/rd: fix or rewrite the copy routine target/rd: simplify the page/offset computation target: remove the unused se_dev_list target/file: walk properly over sg list target: remove unused struct fields target: Fix page length in emulated INQUIRY VPD page 86h target: Handle 0 correctly in transport_get_sectors_6() target: Don't return an error status for 0-length READ and WRITE iscsi-target: Use kmemdup rather than duplicating its implementation iscsi-target: Add missing F_BIT for iscsi_tm_rsp iscsi-target: Fix residual count hanlding + remove iscsi_cmd->residual_count target: Reject SCSI data overflow for fabrics using transport_generic_map_mem_to_cmd target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fields target: remove the t_tasks_bidi se_cmd field target: remove the t_tasks_fua se_cmd field target: remove the se_ordered_node se_cmd field target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fields target: Drop config_item_name usage in fabric TFO->free_wwn() ...	2011-12-07 18:18:27 -08:00
Linus Torvalds	3172f8fe1c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API	2011-12-07 08:14:42 -08:00
Al Viro	02125a8264	fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does NOT try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org	2011-12-06 23:57:18 -05:00
Linus Torvalds	091c0f86ba	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ftrace: Fix hash record accounting bug perf: Fix parsing of __print_flags() in TP_printk() jump_label: jump_label_inc may return before the code is patched ftrace: Remove force undef config value left for testing tracing: Restore system filter behavior tracing: fix event_subsystem ref counting	2011-12-06 11:54:33 -08:00
Peter Zijlstra	fdaabd800b	sched: Fix compile error for UP,!NOHZ Commit `69e1e811` ("sched, nohz: Track nr_busy_cpus in the sched_group_power") messed up the static inline function definition. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/n/tip-abjah8ctq5qrjjtdiabe8lph@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:51:18 +01:00
Peter Zijlstra	ac99b862fb	jump_label: Provide jump_label_key initializers Provide two initializers for jump_label_key that initialize it enabled or disabled. Also modify all jump_label code to allow for jump_labels to be initialized enabled. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-p40e3yj21b68y03z1yv825e7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:41:03 +01:00
Glauber Costa	3292beb340	sched/accounting: Change cpustat fields to an array This patch changes fields in cpustat from a structure, to an u64 array. Math gets easier, and the code is more flexible. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 09:06:38 +01:00
Suresh Siddha	69e1e811dc	sched, nohz: Track nr_busy_cpus in the sched_group_power Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group because sched groups are duplicated for the SD_OVERLAP scheduler domain] and for each cpu that enters and exits idle, this parameter will be updated in each scheduler group of the scheduler domain that this cpu belongs to. To avoid the frequent update of this state as the cpu enters and exits idle, the update of the stat during idle exit is delayed to the first timer tick that happens after the cpu becomes busy. This is done using NOHZ_IDLE flag in the struct rq's nohz_flags. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 09:06:32 +01:00
Andrew Vagin	b781a602ac	events, sched: Add tracepoint for accounting blocked time This tracepoint shows how long a task is sleeping in uninterruptible state. E.g. it may show how long and where a mutex is waited for. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:51:23 +01:00
Gleb Natapov	b202952075	perf, core: Rate limit perf_sched_events jump_label patching jump_lable patching is very expensive operation that involves pausing all cpus. The patching of perf_sched_events jump_label is easily controllable from userspace by unprivileged user. When te user runs a loop like this: "while true; do perf stat -e cycles true; done" ... the performance of my test application that just increments a counter for one second drops by 4%. This is on a 16 cpu box with my test application using only one of them. An impact on a real server doing real work will be worse. Performance of KVM PMU drops nearly 50% due to jump_lable for "perf record" since KVM PMU implementation creates and destroys perf event frequently. This patch introduces a way to rate limit jump_label patching and uses it to fix the above problem. I believe that as jump_label use will spread the problem will become more common and thus solving it in a generic code is appropriate. Also fixing it in the perf code would result in moving jump_label accounting logic to perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this patch all details are nicely hidden inside jump_label code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:34:02 +01:00
Robert Richter	1e2ad28f80	perf, x86: Implement event scheduler helper functions This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:33:54 +01:00
Peter Zijlstra	0f5a260128	perf: Avoid a useless pmu_disable() in the perf-tick Gleb writes: > Currently pmu is disabled and re-enabled on each timer interrupt even > when no rotation or frequency adjustment is needed. On Intel CPU this > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal > it does not cause significant slowdown, but when running perf in a virtual > machine it leads to 20% slowdown on my machine. Cure this by keeping a perf_event_context::nr_freq counter that counts the number of active events that require frequency adjustments and use this in a similar fashion to the already existing nr_events != nr_active test in perf_rotate_context(). By being able to exclude both rotation and frequency adjustments a-priory for the common case we can avoid the otherwise superfluous PMU disable. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:33:52 +01:00
Christoph Hellwig	6f21475576	target: remove the unused se_dev_list Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:57 +00:00
Jörn Engel	5c73b678f7	target: remove unused struct fields Some are never used, some are set but never read, dev_hoq_count is incremented and decremented, but never read. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:56 +00:00
Christoph Hellwig	6fd126ffeb	target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fields Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:53 +00:00
Christoph Hellwig	33c3fafc43	target: remove the t_tasks_bidi se_cmd field And use a SCF_BIDI flag instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:53 +00:00
Christoph Hellwig	2d3a4b51df	target: remove the t_tasks_fua se_cmd field And use a SCF_FUA flag instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:52 +00:00
Christoph Hellwig	aad13ca20d	target: remove the se_ordered_node se_cmd field We never walk ordered_cmd_list in the se_device, so remove all code related to supporting it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:52 +00:00
Christoph Hellwig	58a2801a4b	target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fields We already have a perfectly valid se_device pointer in the command, so remove the mostly useless duplicates. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:52 +00:00
Bart Van Assche	5f655e8d2a	target: Avoid compiler warnings about signed one-bit bitfields Convert to unsigned bit fields for active I/O shutdown fields. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:50 +00:00
Nicholas Bellinger	03e98c9eb9	target: Address legacy PYX_TRANSPORT_* return code breakage This patch removes legacy usage of PYX_TRANSPORT_* return codes in a number of locations and addresses cases where transport_generic_request_failure() was returning the incorrect sense upon CHECK_CONDITION status after the v3.1 converson to use errno return codes. This includes the conversion of transport_generic_request_failure() to process cmd->scsi_sense_reason and handle extra TCM_RESERVATION_CONFLICT before calling transport_send_check_condition_and_sense() to queue up response status. It also drops PYX_TRANSPORT_OUT_OF_MEMORY_RESOURCES legacy usgae, and returns TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE w/ a response for these cases. transport_generic_allocate_tasks(), transport_generic_new_cmd(), backend SCF_SCSI_DATA_SG_IO_CDB ->do_task(), and emulated ->execute_task() have all been updated to set se_cmd->scsi_sense_reason and return errno codes universally upon failure. This includes cmd->scsi_sense_reason assignment in target_core_alua.c, target_core_pr.c and target_core_cdb.c emulation code. Finally it updates fabric modules to remove the legacy usage, and for TFO->new_cmd_map() callers forwards return values outside of fabric code. iscsi-target has also been updated to remove a handful of special cases related to the cleanup and signaling QUEUE_FULL handling w/ ft_write_pending() (v2: Drop extra SCF_SCSI_CDB_EXCEPTION check during failure from transport_generic_new_cmd, and re-add missing task->task_error_status assignment in transport_complete_task) Cc: Christoph Hellwig <hch@lst.de> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2011-12-06 06:00:49 +00:00
Linus Torvalds	45e713efe2	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intr_remapping: Fix section mismatch in ir_dev_scope_init() intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev() x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions x86, AMD: Correct align_va_addr documentation x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms x86/mrst: Battery fixes x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode x86: Fix "Acer Aspire 1" reboot hang x86/mtrr: Resolve inconsistency with Intel processor manual x86: Document rdmsr_safe restrictions x86, microcode: Fix the failure path of microcode update driver init code Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup x86/mpparse: Account for bus types other than ISA and PCI x86, mrst: Change the pmic_gpio device type to IPC mrst: Added some platform data for the SFI translations x86,mrst: Power control commands update x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot x86, UV: Fix UV2 hub part number x86: Add user_mode_vm check in stack_overflow_check	2011-12-05 16:54:15 -08:00
Linus Torvalds	232ea34455	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix loss of notification with multi-event perf, x86: Force IBS LVT offset assignment for family 10h perf, x86: Disable PEBS on SandyBridge chips trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter perf session: Fix crash with invalid CPU list perf python: Fix undefined symbol problem perf/x86: Enable raw event access to Intel offcore events perf: Don't use -ENOSPC for out of PMU resources perf: Do not set task_ctx pointer in cpuctx if there are no events in the context perf/x86: Fix PEBS instruction unwind oprofile, x86: Fix crash when unloading module (nmi timer mode) oprofile: Fix crash when unloading module (hr timer mode)	2011-12-05 16:54:00 -08:00
Linus Torvalds	7125faceab	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched, x86: Avoid unnecessary overflow in sched_clock sched: Fix buglet in return_cfs_rq_runtime() sched: Avoid SMT siblings in select_idle_sibling() if possible sched: Set the command name of the idle tasks in SMP kernels sched, rt: Provide means of disabling cross-cpu bandwidth sharing sched: Document wait_for_completion_*() return values sched_fair: Fix a typo in the comment describing update_sd_lb_stats sched: Add a comment to effective_load() since it's a pain	2011-12-05 16:50:24 -08:00

1 2 3 4 5 ...

46,601 commits