linux-pinenote

Author	SHA1	Message	Date
Thomas Renninger	a0ad05c75a	Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs The idea is to add this to printk after the severity: printk(KERN_ERR FW_BUG "This is not our fault, BIOS developer: fix it by simply add ...\n"); If a Firmware issue should be hidden, because it is work-arounded, but you still want to see something popping up e.g. for info only: printk(KERN_INFO FW_INFO "This is done stupid, we can handle it, but it should better be avoided in future\n"); or on the Linuxfirmwarekit to tell vendors that they did something stupid or wrong without bothering the user: printk(KERN_INFO FW_BUG "This is done stupid, we can handle it, but it should better be avoided in future\n"); Some use cases: - If a user sees a [Firmware Bug] message in the kernel he should first update the BIOS before wasting time with debugging and submiting on old firmware code to mailing lists. - The linuxfirmwarekit (http://www.linuxfirmwarekit.org) tries to detect firmware bugs. It currently is doing that in userspace which results in: - Huge test scripts that could be a one liner in the kernel - A lot of BIOS bugs are already absorbed by the kernel What do we need such a stupid linuxfirmwarekit for? - Vendors: Can test their BIOSes for Linux compatibility. There will be the time when vendors realize that the test utils on Linux are more strict and using them increases the qualitity and stability of their products. - Vendors: Can easily fix up their BIOSes and be more Linux compatible by: dmesg \|grep "Firmware Bug" and send the result to their BIOS developer colleagues who should know what the messages are about and how to fix them, without the need of studying kernel code. - Distributions: can do a first automated HW/BIOS checks. This can then be done without the need of asking kernel developers who need to dig down the code and explain the details. Certification can/will just be rejected until dmesg \|grep "Firmware Bug" is empty. - Thus this can be used as an instrument to enforce cleaner BIOS code. Currently every stupid Windows ACPI bug is re-implemented in Linux which is a rather unfortunate situation. We already have the power to avoid this in e.g. memory or cpu hot-plug ACPI implementations, because Linux certification is a must for most vendors in the server area. Working towards being able to do that in the laptop area (vendors are starting to look at Linux here also and will use this tool) is the goal. At least provide them a tool to make it as easy for this guys (e.g. not needing to browse kernel code) as possible. - The ordinary Linux user: can go into the next shop, boots the firmwarekit on his most preferred machines. He chooses one without BIOS bugs. Unsupported HW is ok, he likes to try out latest projects which might support them or likes to dig on it on his own, but he hates to workaround broken BIOSes like hell. I double checked with the firmwarekit. There they have: So the mapping generally is (also depending on how likely the BIOS is to blame, this could sometimes be difficult): FW_INFO = INFO FW_WARN = WARN FW_BUG = FAIL For more info about the linuxfirmwarekit and why this is needed can be found here: http://www.linuxfirmwarekit.org While severity matches with the firmwarekit, it might be tricky to hide messages from the user. E.g. we recently found out that on HP BIOSes negative temperatures are returned, which seem to indicate that the thermal zone is invalid. We can work around that gracefully by ignoring the thermal zone and we do not want to bother the ordinary user with a frightening message: Firmware Bug: thermal management absolutely broken but want to hide it from the user. But in the linuxfirmwarekit this should be shown as a real show stopper (the temperatures could really be wrong, broken thermal management is one of the worst things that can happen and the BIOS guys of the machine must implement this properly). It is intended to do that (hide it from the user with KERN_INFO msg, but still print it as a BIOS bug) by: printk(KERN_INFO FW_BUG "Negativ temperature values detected. Try to workarounded, BIOS must get fixed\n"); Hope that works out..., no idea how to better hide it as printk is the only way to easily provide this functionality. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-09-22 18:42:51 -04:00
FUJITA Tomonori	d26dbc5cf9	iommu: export iommu_area_reserve helper function x86 has set_bit_string() that does the exact same thing that set_bit_area() in lib/iommu-helper.c does. This patch exports set_bit_area() in lib/iommu-helper.c as iommu_area_reserve(), converts GART, Calgary, and AMD IOMMU to use it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-22 16:47:50 +02:00
Peter Zijlstra	15afe09bf4	sched: wakeup preempt when small overlap Lin Ming reported a 10% OLTP regression against 2.6.27-rc4. The difference seems to come from different preemption agressiveness, which affects the cache footprint of the workload and its effective cache trashing. Aggresively preempt a task if its avg overlap is very small, this should avoid the task going to sleep and find it still running when we schedule back to it - saving a wakeup. Reported-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-22 16:28:32 +02:00
Mark McLoughlin	b91c4996df	hrtimer: remove hrtimer_clock_base::reprogram() hrtimer_clock_base::reprogram() also appears to never have been used, so remove it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-22 13:52:42 +02:00
Mark McLoughlin	d7cfb60c5c	hrtimer: remove hrtimer_clock_base::get_softirq_time() Peter Zijlstra noticed this 8 months ago and I just noticed it again. hrtimer_clock_base::get_softirq_time() is currently unused in the entire tree. In fact, looking at the logs, it appears as if it was never used. Remove it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-22 13:52:42 +02:00
David S. Miller	38783e6713	isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 01:15:02 -07:00
Steven Whitehouse	6d80c39f91	GFS2: Add UUID to GFS2 sb This patch adds a UUID to the GFS2 sb structure. This field is not actually referenced from kernel space at all, but is added for completeness and due to the userland tools which get their on-disk structure information from the gfs2_ondisk.h header file. Since we have to be backwards compatible, we will assume that any GFS2 sb for which the UUID is all 0 does not have a UUID as such. We should then be (after some userland changes) able to support the -U mount option. This addresses Fedora bugzilla #242689 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-09-22 07:29:31 +01:00
David S. Miller	67fed45930	net: Add new interfaces for SKB list light-weight init and splicing. This will be used by subsequent changesets. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 22:36:24 -07:00
James Morris	ab2b49518e	Merge branch 'master' into next Conflicts: MAINTAINERS Thanks for breaking my tree :-) Signed-off-by: James Morris <jmorris@namei.org>	2008-09-21 17:41:56 -07:00
Ilpo Järvinen	0e1c54c2a4	tcp: reorganize retransmit code loops Both loops are quite similar, so they can be combined with little effort. As a result, forward_skb_hint becomes obsolete as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:24:21 -07:00
Ilpo Järvinen	006f582c73	tcp: convert retransmit_cnt_hint to seqno Main benefit in this is that we can then freely point the retransmit_skb_hint to anywhere we want to because there's no longer need to know what would be the count changes involve, and since this is really used only as a terminator, unnecessary work is one time walk at most, and if some retransmissions are necessary after that point later on, the walk is not full waste of time anyway. Since retransmit_high must be kept valid, all lost markers must ensure that. Now I also have learned how those "holes" in the rexmittable skbs can appear, mtu probe does them. So I removed the misleading comment as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:20:20 -07:00
Linus Torvalds	5a0cd4eb66	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() RDMA/nes: Fix client side QP destroy IB/mlx4: Fix up fast register page list format mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries	2008-09-19 16:18:21 -07:00
David S. Miller	d950f264ff	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2008-09-19 16:17:12 -07:00
FUJITA Tomonori	07a2c01a0c	convert swiotlb to use dma_get_mask swiotlb can use dma_get_mask() instead of the homegrown function. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-19 10:20:41 +02:00
Lennert Buytenhek	4fd5f812c2	phylib: allow incremental scanning of an mii bus This patch splits the bus scanning code in mdiobus_register() off into a separate function, and makes this function available for calling from external code. This allows incrementally scanning an mii bus, e.g. as information about which addresses are 'safe' to scan becomes available. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Acked-by: Andy Fleming <afleming@freescale.com>	2008-09-19 05:13:54 +02:00
Scott Feldman	01f2e4ead2	enic: add Cisco 10G Ethernet NIC driver Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-18 11:34:53 -04:00
Chris Snook	452c1ce218	atl2: add atl2 driver Driver for Atheros L2 10/100 network device. Includes necessary changes for Kconfig, Makefile, and pci_ids.h. Signed-off-by: Chris Snook <csnook@redhat.com> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-18 11:34:52 -04:00
Eric Miao	6ae19b04ab	Input: ads7846 - introduce .gpio_pendown to get pendown state The GPIO connected to ADS7846 nPENIRQ signal is usually used to get the pendown state as well. Introduce a .gpio_pendown, and use this to decide the pendown state if .get_pendown_state is NULL. Signed-off-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2008-09-17 17:33:37 +01:00
David Vrabel	b60066c141	uwb: add symlinks in sysfs between radio controllers and PALs Add a facility for PALs to have symlinks to their radio controller (and vice-versa) and make WUSB host controllers use this. Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:35 +01:00
Inaky Perez-Gonzalez	c7f736484f	wusb: add the Wireless USB include files. Common header files derived from the WUSB 1.0 specification. Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:29 +01:00
David Vrabel	8f1b678ab9	uwb: add the driver to enumerate WHCI capabilities This enumerates the capabilties of a WHCI device, adding a umc device for each one. Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:26 +01:00
David Vrabel	da389eac31	uwb: add the umc bus The UMC bus is used for the capabilities exposed by a UWB Multi-interface Controller as described in the WHCI specification. Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:25 +01:00
Inaky Perez-Gonzalez	34e95e41f1	uwb: add the uwb include files Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:23 +01:00
David Vrabel	ccbe329bcd	bitmap: add bitmap_copy_le() bitmap_copy_le() copies a bitmap, putting the bits into little-endian order (i.e., each unsigned long word in the bitmap is put into little-endian order). The UWB stack used bitmaps to manage Medium Access Slot availability, and these bitmaps need to be written to the hardware in LE order. Signed-off-by: David Vrabel <david.vrabel@csr.com>	2008-09-17 16:54:22 +01:00
David Miller	ef3d7714f6	Fix PNP build failure, bugzilla #11276 This fill fix the following regression list entry: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11276 Subject : build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things Submitter : Randy Dunlap <randy.dunlap@oracle.com> Date : 2008-08-06 17:18 (38 days old) References : http://marc.info/?l=linux-kernel&m=121804329014332&w=4 http://lkml.org/lkml/2008/7/22/353 Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> Patch : http://lkml.org/lkml/2008/7/22/364 with what I believe is a better fix than the one referenced in the regression entry above. These PNP header interfaces try to work in such a way that you can reference some of them even if PNP is not enabled, and the compiler was expected to optimize everything away. Which is mostly fine, except that there was one interface for which there was not provided an inline "NOP" implementation. Once we add that, all of these compile failures cannot handle any more. pnp: Provide NOP inline implementation of pnp_get_resource() when !PNP Fixes kernel bugzilla #11276. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-16 19:35:05 -07:00
Greg KH	b08508c40a	PCI: fix compiler warnings in pci_get_subsys() pci_get_subsys() changed in 2.6.26 so that the from pointer is modified when the call is being invoked, so fix up the 'const' marking of it that the compiler is complaining about. Reported-by: Rufus & Azrael <rufus-azrael@numericable.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-09-16 15:52:08 -07:00
David S. Miller	2e57572a50	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 Conflicts: arch/sparc64/kernel/pci_psycho.c	2008-09-16 14:11:43 -07:00
Ingo Molnar	e3bbaa3cb6	Merge commit 'v2.6.27-rc6' into x86/memory-corruption-check	2008-09-16 09:34:23 +02:00
Vladimir Sokolovsky	29bdc88384	IB/mlx4: Fix up fast register page list format Byte swap the addresses in the page list for fast register work requests to big endian to match what the HCA expectx. Also, the addresses must have the "present" bit set so that the HCA knows it can access them. Otherwise the HCA will fault the first time it accesses the memory region. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-09-15 14:25:23 -07:00
Luis R. Rodriguez	b2e1b30290	cfg80211: Add new wireless regulatory infrastructure This adds the new wireless regulatory infrastructure. The main motiviation behind this was to centralize regulatory code as each driver was implementing their own regulatory solution, and to replace the initial centralized code we have where: * only 3 regulatory domains are supported: US, JP and EU * regulatory domains can only be changed through module parameter * all rules were built statically in the kernel We now have support for regulatory domains for many countries and regulatory domains are now queried through a userspace agent through udev allowing distributions to update regulatory rules without updating the kernel. Each driver can regulatory_hint() a regulatory domain based on either their EEPROM mapped regulatory domain value to a respective ISO/IEC 3166-1 country code or pass an internally built regulatory domain. We also add support to let the user set the regulatory domain through userspace in case of faulty EEPROMs to further help compliance. Support for world roaming will be added soon for cards capable of this. For more information see: http://wireless.kernel.org/en/developers/Regulatory/CRDA For now we leave an option to enable the old module parameter, ieee80211_regdom, and to build the 3 old regdomains statically (US, JP and EU). This option is CONFIG_WIRELESS_OLD_REGULATORY. These old static definitions and the module parameter is being scheduled for removal for 2.6.29. Note that if you use this you won't make use of a world regulatory domain as its pointless. If you leave this option enabled and if CRDA is present and you use US or JP we will try to ask CRDA to update us a regulatory domain for us. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-15 16:48:19 -04:00
Ingo Molnar	83bd6998b0	Merge commit 'v2.6.27-rc6' into timers/hpet	2008-09-14 18:24:00 +02:00
Jeremy Fitzhardinge	8308c54d7e	generic: redefine resource_size_t as phys_addr_t There's no good reason why a resource_size_t shouldn't just be a physical address, so simply redefine it in terms of phys_addr_t. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 17:24:27 +02:00
Jeremy Fitzhardinge	947d0496cf	generic: make PFN_PHYS explicitly return phys_addr_t PFN_PHYS, as its name suggests, turns a pfn into a physical address. However, it is a macro which just operates on its argument without modifying its type. pfns are typed unsigned long, but an unsigned long may not be long enough to hold a physical address (32-bit systems with more than 32 bits of physcial address). Make sure we cast to phys_addr_t to return a complete result. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 17:24:26 +02:00
Jeremy Fitzhardinge	600715dcdf	generic: add phys_addr_t for holding physical addresses Add a kernel-wide "phys_addr_t" which is guaranteed to be able to hold any physical address. By default it equals the word size of the architecture, but a 32-bit architecture can set ARCH_PHYS_ADDR_T_64BIT if it needs a 64-bit phys_addr_t. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 17:24:25 +02:00
Ingo Molnar	9dfed08eb4	Merge commit 'v2.6.27-rc6' into core/resources	2008-09-14 17:23:29 +02:00
Ingo Molnar	5ce73a4a5a	timers: fix itimer/many thread hang, cleanups Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 17:11:46 +02:00
Ingo Molnar	0a8eaa4f9b	timers: fix itimer/many thread hang, fix #2 fix the UP build: In file included from arch/x86/kernel/asm-offsets_32.c:9, from arch/x86/kernel/asm-offsets.c:3: include/linux/sched.h: In function ‘thread_group_cputime_clone_thread’: include/linux/sched.h:2272: warning: no return statement in function returning non-void include/linux/sched.h: In function ‘thread_group_cputime_account_user’: include/linux/sched.h:2284: error: invalid type argument of ‘->’ (have ‘struct task_cputime’) include/linux/sched.h:2284: error: invalid type argument of ‘->’ (have ‘struct task_cputime’) include/linux/sched.h: In function ‘thread_group_cputime_account_system’: include/linux/sched.h:2291: error: invalid type argument of ‘->’ (have ‘struct task_cputime’) include/linux/sched.h:2291: error: invalid type argument of ‘->’ (have ‘struct task_cputime’) include/linux/sched.h: In function ‘thread_group_cputime_account_exec_runtime’: include/linux/sched.h:2298: error: invalid type argument of ‘->’ (have ‘struct task_cputime’) distcc[14501] ERROR: compile arch/x86/kernel/asm-offsets.c on a/30 failed make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 17:02:43 +02:00
FUJITA Tomonori	589fc9a6e2	iommu: add dma_get_mask helper function Several IOMMUs do the same thing to get the dma_mask of a device. This adds a helper function to do the same thing to sweep them. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:42:37 +02:00
FUJITA Tomonori	eecfffc154	iommu: add iommu_device_max_index IOMMU helper function This function helps IOMMUs to know the highest address that a device can access to. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:42:36 +02:00
Frank Mayhar	f06febc96b	timers: fix itimer/many thread hang Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:25:35 +02:00
Ingo Molnar	6e03f99803	Merge branch 'linus' into x86/iommu Conflicts: lib/swiotlb.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 14:07:00 +02:00
Linus Torvalds	7c22a3d853	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] LBA28/LBA48 off-by-one bug in ata.h sata_inic162x: enable LED blinking ata: duplicate variable sparse warning	2008-09-13 14:48:14 -07:00
Alex Dubov	8e82f8c34b	memstick: fix MSProHG 8-bit interface mode support - 8-bit interface mode never worked properly. The only adapter I have which supports the 8b mode (the Jmicron) had some problems with its clock wiring and they discovered it only now. We also discovered that ProHG media is more sensitive to the ordering of initialization commands. - Make the driver fall back to highest supported mode instead of always falling back to serial. The driver will attempt the switch to 8b mode for any new MSPro card, but not all of them support it. Previously, these new cards ended up in serial mode, which is not the best idea (they work fine with 4b, after all). - Edit some macros for better conformance to Sony documentation Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-13 14:41:52 -07:00
Mel Gorman	5bead2a068	mm: mark the correct zone as full when scanning zonelists The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning zonelists to keep track of where in the zonelist it is. The zoneref that is returned corresponds to the the next zone that is to be scanned, not the current one. It was intended to be treated as an opaque list. When the page allocator is scanning a zonelist, it marks elements in the zonelist corresponding to zones that are temporarily full. As the zonelist is being updated, it uses the cursor here; if (NUMA_BUILD) zlc_mark_zone_full(zonelist, z); This is intended to prevent rescanning in the near future but the zoneref cursor does not correspond to the zone that has been found to be full. This is an easy misunderstanding to make so this patch corrects the problem by changing zoneref cursor to be the current zone being scanned instead of the next one. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-13 14:41:52 -07:00
Hiroshi DOYU	dea420ce0e	include/linux/ioport.h: add missing macro argument for devm_release_* family akpm: these have no callers at this time, but they shall soon, so let's get them right. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-13 14:41:50 -07:00
Taisuke Yamada	97b697a11b	[libata] LBA28/LBA48 off-by-one bug in ata.h I recently bought 3 HGST P7K500-series 500GB SATA drives and had trouble accessing the block right on the LBA28-LBA48 border. Here's how it fails (same for all 3 drives): # dd if=/dev/sdc bs=512 count=1 skip=268435455 > /dev/null dd: reading `/dev/sdc': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.288033 seconds, 0.0 kB/s # dmesg ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: BMDMA stat 0x25 ata1.00: cmd c8/00:08:f8:ff:ff/00:00:00:00:00/ef tag 0 dma 4096 in res 51/04:08:f8:ff:ff/00:00:00:00:00/ef Emask 0x1 (device error) ata1.00: status: { DRDY ERR } ata1.00: error: { ABRT } ata1.00: configured for UDMA/33 ata1: EH complete ... After some investigations, it turned out this seems to be caused by misinterpretation of the ATA specification on LBA28 access. Following part is the code in question: === include/linux/ata.h === static inline int lba_28_ok(u64 block, u32 n_block) { /* check the ending block number / return ((block + n_block - 1) < ((u64)1 << 28)) && (n_block <= 256); } HGST drive (sometimes) fails with LBA28 access of {block = 0xfffffff, n_block = 1}, and this behavior seems to be comformant. Other drives, including other HGST drives are not that strict, through. >From the ATA specification: (http://www.t13.org/Documents/UploadedDocuments/project/d1410r3b-ATA-ATAPI-6.pdf) 8.15.29 Word (61:60): Total number of user addressable sectors This field contains a value that is one greater than the total number of user addressable sectors (see 6.2). The maximum value that shall be placed in this field is 0FFFFFFFh. So the driver shouldn't use the value of 0xfffffff for LBA28 request as this exceeds maximum user addressable sector. The logical maximum value for LBA28 is 0xffffffe. The obvious fix is to cut "- 1" part, and the patch attached just do that. I've been using the patched kernel for about a month now, and the same fix is also floating on the net for some time. So I believe this fix works reliably. Just FYI, many Windows/Intel platform users also seems to be struck by this, and HGST has issued a note pointing to Intel ICH8/9 driver. "28-bit LBA command is being used to access LBAs 29-bits in length" `b531b8bce8` Also, BSDs seems to have similar fix included sometime around ~2004, through I have not checked out exact portion of the code. Signed-off-by: Taisuke Yamada <tai@rakugaki.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-13 16:46:15 -04:00
Alexander Duyck	ca9b0e27e0	pkt_action: add new action skbedit This new action will have the ability to change the priority and/or queue_mapping fields on an sk_buff. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:30:20 -07:00
Alexander Duyck	92651940ab	pkt_sched: Add multiqueue scheduler support This patch is intended to add a qdisc to support the new tx multiqueue architecture by providing a band for each hardware queue. By doing this it is possible to support a different qdisc per physical hardware queue. This qdisc uses the skb->queue_mapping to select which band to place the traffic onto. It then uses a round robin w/ a check to see if the subqueue is stopped to determine which band to dequeue the packet from. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:29:34 -07:00
David S. Miller	c655705037	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2008-09-11 15:46:02 -07:00
Johannes Berg	44d414dbff	mac80211: move some HT code out of mlme.c Some of the HT code in mlme.c is misplaced: * constants/definitions belong to the ieee80211.h header * code being used in other modes as well shouldn't be there Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-11 15:53:37 -04:00

... 17 18 19 20 21 ...

13101 commits