Idling logic was disabled in some corner cases, leading to unfair share
for noidle queues.
* the idle timer was not armed if there were other requests in the
driver. unfortunately, those requests could come from other workloads,
or queues for which we don't enable idling. So we will check only
pending requests from the active queue
* rq_noidle check on no-idle queue could disable the end of tree idle if
the last completed request was rq_noidle. Now, we will disable that
idle only if all the queues served in the no-idle tree had rq_noidle
requests.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Seeky sync queues with large depth can gain unfairly big share of disk
time, at the expense of other seeky queues. This patch ensures that
idling will be enabled for queues with I/O depth at least 4, and small
think time. The decision to enable idling is sticky, until an idle
window times out without seeing a new request.
The reasoning behind the decision is that, if an application is using
large I/O depth, it is already optimized to make full utilization of
the hardware, and therefore we reserve a slice of exclusive use for it.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
An incoming no-idle queue should preempt the active no-idle queue
only if the active queue is idling due to service tree empty.
Previous code was buggy in two ways:
* it relied on service_tree field to be set on the active queue, while
it is not set when the code is idling for a new request
* it didn't check for the service tree empty condition, so could lead to
LIFO behaviour if multiple queues with depth > 1 were preempting each
other on an non-NCQ device.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Make it readable in the source too, not just in the assembly output.
No change in functionality.
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zero the input register in the exception handler instead of
using an extra register to pass in a zero value.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The mce_disable_cpu() and mce_reenable_cpu() are called only
from mce_cpu_callback() which is marked as __cpuinit.
So these functions can be __cpuinit too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4B0E3C4E.4090809@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Limit the number of per cpu calibration messages by only
printing out results for the first cpu to boot.
Also, don't print "CPUx is down" as this is expected, and we
don't need 4096 reminders... ;-)
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Limit the number of per cpu TSC sync messages by only printing
to the console if an error occurs, otherwise print as a DEBUG
message.
The info message "Skipping synchronization ..." is only printed
after the last cpu has booted.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002222.181053000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
BugLink: https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4792
Cristian reported that these models have really bad sound above 6 dB
and proposed the original patch. I've updated the comment to reflect
this change.
Signed-off-by: Daniel T Chen <crimsun@ubuntu.com>
Reported-by: Cristian Klein
Signed-off-by: Takashi Iwai <tiwai@suse.de>
CFQ's detection of queueing devices initially assumes a queuing device
and detects if the queue depth reaches a certain threshold.
However, it will reconsider this choice periodically.
Unfortunately, if device is considered not queuing, CFQ will force a
unit queue depth for some workloads, thus defeating the detection logic.
This leads to poor performance on queuing hardware,
since the idle window remains enabled.
Given this premise, switching to hw_tag = 0 after we have proved at
least once that the device is NCQ capable is not a good choice.
The new detection code starts in an indeterminate state, in which CFQ behaves
as if hw_tag = 1, and then, if for a long observation period we never saw
large depth, we switch to hw_tag = 0, otherwise we stick to hw_tag = 1,
without reconsidering it again.
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There seems to be a regression in direct write path due to following
commit in for-2.6.33 branch of block tree.
commit 1af60fbd75
Author: Jeff Moyer <jmoyer@redhat.com>
Date: Fri Oct 2 18:56:53 2009 -0400
block: get rid of the WRITE_ODIRECT flag
Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
more request from the queue and not idle on it (despite the fact that
queue's think time is less and it is not seeky).
So direct writers lose big time when competing with sequential readers.
Using fio, I have run one direct writer and two sequential readers and
following are the results with 2.6.32-rc7 kernel and with for-2.6.33
branch.
Test
====
1 direct writer and 2 sequential reader running simultaneously.
[global]
directory=/mnt/sdc/fio/
runtime=10
[seqwrite]
rw=write
size=4G
direct=1
[seqread]
rw=read
size=2G
numjobs=2
2.6.32-rc7
==========
direct writes: aggrb=2,968KB/s
readers : aggrb=101MB/s
for-2.6.33 branch
=================
direct write: aggrb=19KB/s
readers aggrb=137MB/s
This patch brings back the WRITE_ODIRECT flag, with the difference that we
don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
submission of request and an explicit unplug from submitter is required.
That way we fix the jeff's issue of not enough merging taking place in aio
path as well as make sure direct writes get their fair share.
After the fix
=============
for-2.6.33 + fix
----------------
direct writes: aggrb=2,728KB/s
reads: aggrb=103MB/s
Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cfq_should_idle returns false for no-idle queues that are not the last,
so the control flow will never reach the removed code in a state that
satisfies the if condition.
The unreachable code was added to emulate previous cfq behaviour for
non-NCQ rotational devices. My tests show that even without it, the
performances and fairness are comparable with previous cfq, thanks to
the fact that all seeky queues are grouped together, and that we idle at
the end of the tree.
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If the new percpu tree is combined with the perf events tree
the following new warning triggers:
kernel/hw_breakpoint.c: In function 'toggle_bp_task_slot':
kernel/hw_breakpoint.c:151: warning: 'task_bp_pinned' is used uninitialized in this function
Because it's not valid anymore to define a local variable
and a percpu variable (even if it's file scope local) with
the same name.
Rename the local variable to resolve this.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <200911260701.nAQ71owx016356@imap1.linux-foundation.org>
[ v2: added changelog ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we schedule out a breakpoint from the cpu, we also
incidentally remove the "Global exact breakpoint" flag from the
breakpoint control register. It makes us losing the fine grained
precision about the origin of the instructions that may trigger
breakpoint exceptions for the other breakpoints running in this
cpu.
Reported-by: Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259211878-6013-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This simplifies the error handling when we create a breakpoint.
We don't need to check the NULL return value corner case anymore
since we have improved perf_event_create_kernel_counter() to
always return an error code in the failure case.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1259210142-5714-3-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In fail case, perf_event_create_kernel_counter() returns NULL
instead of an error, which doesn't help us to inform the user
about the origin of the problem from the outer most callers.
Often we can just return -EINVAL, which doesn't help anyone when
it's eventually about a memory allocation failure.
Then, this patch makes perf_event_create_kernel_counter() always
return a detailed error code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1259210142-5714-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The error path of a breakpoint modification is broken in
the ksym tracer. A modified breakpoint hlist node is immediately
released after its removal. Also we leak a breakpoint in this
case.
Fix the path.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1259210142-5714-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Writes may take some time on EEPROMs, so for consecutive writes, we already
have a loop waiting for the EEPROM to become ready. Use such a loop for reads,
too, in case somebody wants to immediately read after a write. Detailed bug
report and test case can be found here:
http://article.gmane.org/gmane.linux.drivers.i2c/4660
Reported-by: Aleksandar Ivanov <ivanov.aleks@gmail.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Aleksandar Ivanov <ivanov.aleks@gmail.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
According to the TAOS Application Note 'Controlling a Backlight with
the TSL2550 Ambient Light Sensor' (page 14), the actual lux value in
extended mode should be obtained multiplying the calculated lux value
by 5.
Signed-off-by: Michele Jr De Candia <michele.decandia@valueteam.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Mtdblock driver doesn't call flush_dcache_page for pages in request. So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases. The patch fixes this.
The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op. Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.
See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.
Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
For the moment, different workload cfq queues are put into different
service trees. But CFQ still uses "busy_queues" to estimate rb_key
offset when inserting a cfq queue into a service tree. I think this
isn't appropriate, and it should make use of service tree count to do
this estimation. This patch is for for-2.6.33 branch.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use DECLARE_EVENT_CLASS to remove duplicate code:
text data bss dec hex filename
4312 524 12 4848 12f0 kernel/trace/power-traces.o.old
3455 524 8 3987 f93 kernel/trace/power-traces.o
Two events are converted:
power: power_start, power_frequency
No change in functionality.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <4B0E28C2.1090906@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use DECLARE_EVENT_CLASS to remove duplicate code:
text data bss dec hex filename
13171 800 72 14043 36db kernel/workqueue.o.old
12243 800 68 13111 3337 kernel/workqueue.o
Two events are converted:
workqueue: workqueue_insertion, workqueue_execution
No change in functionality.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E289F.5010104@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use DECLARE_EVENT_CLASS to remove duplicate code:
text data bss dec hex filename
12781 952 36 13769 35c9 kernel/softirq.o.old
11981 952 32 12965 32a5 kernel/softirq.o
Two events are converted:
softirq: softirq_entry, softirq_exit
No change in functionality.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E287F.4030708@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use DECLARE_EVENT_CLASS to remove duplicate code:
text data bss dec hex filename
29854 1980 128 31962 7cda kernel/module.o.old
28750 1980 128 30858 788a kernel/module.o
Two events are converted:
module_refcnt: module_get, module_put
No change in functionality.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E283B.3010508@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
does: does it define an event as well beyond defining a template?
To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
the various 'DECLARE_*()' idioms we already have in the kernel:
DECLARE_EVENT_CLASS(class)
DEFINE_EVENT(class, event1)
DEFINE_EVENT(class, event2)
DEFINE_EVENT(class, event3)
To complete this logic we should also rename TRACE_EVENT() to:
DEFINE_SINGLE_EVENT(single_event)
... but in a more quiet moment of the kernel cycle.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the initialization more readable, plus tidy up a few small
visual details as well.
No change in functionality.
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Keysyms stored in key_map[] are not simply K() values, but U(K()) values,
as can be seen in the KDSKBENT ioctl handler. The kernel-generated
braille keysyms thus need a U() call too.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
As side effect, consume less stack.
-rtl8169_get_mac_version [vmlinux]: 432
-rtl8169_init_one [vmlinux]: 376
+rtl8169_init_one [vmlinux]: 136
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These algorithms use a truncation of 192/256 bits, as specified
in RFC4868.
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using the hardcoded truncation for authentication
algorithms, use the truncation length specified on xfrm_state.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.
If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new XFRMA_ALG_AUTH_TRUNC attribute taking a xfrm_algo_auth as
argument allows the installation of authentication algorithms with
a truncation length specified in userspace, i.e. SHA256 with 128 bit
instead of 96 bit truncation.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rewrite statistics accumulation to be in terms of structure fields,
not raw u32 additions. Keep them in same order, though.
This is the last user of create_proc_read_entry() in net/,
please NAK all new ones as well as all new ->write_proc, ->read_proc and
create_proc_entry() users. Cc me if there are problems. :-)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some unused includes removed.
In an effort to cleanup nfsd headers and move private
definitions to source directory.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generated with the following semantic patch
@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)
@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)
applied over {include,net,drivers/net}.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, ide_cmd_ioctl when invoked for setting DMA transfer mode calls
ide_find_dma_mode with requested mode as XFER_UDMA_6. This prevents setting DMA
mode to any other value than the default (maximum) supported by the device (or
UDMA6, if supported) irrespective of the actual requested transfer mode and
returns error.
For example, setting mode to UDMA2 using hdparm, where UDMA4 is the default
transfer mode gives following error:
# ./hdparm -d1 -Xudma2 /dev/hda
/dev/hda:hda: UDMA/66 mode selected
setting using_dma to 1 (on)
hda: UDMA/66 mode selected
setting xfermode to 66 (UltraDMA mode2)
HDIO_DRIVE_CMD(setxfermode) failed: Invalid argument
using_dma = 1 (on)
This patch fixes the issue.
Signed-off-by: Hemant Pedanekar <hemantp@ti.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a minimal defconfig for AM3517 EVM board.
Signed-off-by: Ranjith Lohithakshan <ranjithl@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>