The MMU context_lock can be taken from switch_mm() while the
rq->lock is held. The rq->lock can also be taken from interrupts,
thus if we get interrupted in destroy_context() with the context
lock held and that interrupt tries to take the rq->lock, there's
a possible deadlock scenario with another CPU having the rq->lock
and calling switch_mm() which takes our context lock.
The fix is to always ensure interrupts are off when taking our
context lock. The switch_mm() path is already good so this fixes
the destroy_context() path.
While at it, turn the context lock into a new style spinlock.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch fixes a couple of issues that can happen as a result
of steal_context() dropping the context_lock when all possible
PIDs are ineligible for stealing (hopefully an extremely hard to
hit occurence).
This case exposes the possibility of a stale context_mm[] entry
to be seen since destroy_context() doesn't clear it and the free
map isn't re-tested. It also means steal_context() will not notice
a context freed while the lock was help, thus possibly trying to
steal a context when a free one was available.
This fixes it by always returning to the caller from steal_context
when it dropped the lock with a return value that causes the
caller to re-samble the number of free contexts, along with
properly clearing the context_mm[] array for destroyed contexts.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we support changing the chunksize, we calculate
"reshape_sectors" to be the max of number of sectors in old
and new chunk size.
However there is one please where we still use 'chunksize'
rather than 'reshape_sectors'.
This causes a reshape that reduces the size of chunks to freeze.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently the PCM resources are allocated only once and ever in prepare
callback, assuming that the PCM parameters are never changed. But it's
not true.
This patch adds the call of atc->pcm_release_resources() at hw_params
and hw_free callbacks to assure that the PCM setup is done correctly
for each h/w parameter changes.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The SRC instances may not exist when PCM pointer callback is called at
the state before initialization is finished. Add the NULL check just
to be sure.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
md has functionality to 'quiesce' and array so that all pending
IO completed and no new IO starts. This is used to achieve a
stable state before making internal changes.
Currently this quiescing applies equally to normal IO, resync
IO, and reshape IO.
However there is a problem with applying it to reshape IO.
Reshape can have multiple 'stripe_heads' that must be active together.
If the quiesce come between allocating the first and the last of
such a collection, then we deadlock, as the last will not be allocated
until the quiesce is lifted, the quiesce will not be lifted until the
first (which has been allocated) gets used, and that first cannot be
used until the last is allocated.
It is not necessary to inhibit reshape IO when a quiesce is
requested. Those places in the code that require a full quiesce will
ensure the reshape thread is not running at all.
So allow reshape requests to get access to new stripe_heads without
being blocked by a 'quiesce'.
This only affects in-place reshapes (i.e. where the array does not
grow or shrink) and these are only newly supported. So this patch is
not needed in earlier kernels.
Signed-off-by: NeilBrown <neilb@suse.de>
mddev->raid_disks can be changed and any time by a request from
user-space. It is a suggestion as to what number of raid_disks is
desired.
conf->raid_disks can only be changed by the raid5 module with suitable
locks in place. It is a statement as to the current number of
raid_disks.
There are two places where the latter should be used, but the former
is used. This can lead to a crash when reshaping an array.
This patch changes to mddev-> to conf->
Signed-off-by: NeilBrown <neilb@suse.de>
DM no longer needs to set limits explicitly when calling blk_stack_limits.
Let the latter automatically deal with bounce_pfn scaling.
Fix kerneldoc variable names.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Make SCSI reset error handler decode unit attention ASC
and after a target reset wait for a unit attention that indicates
a reset occurred rather than just for any old unit attention.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Now that the cciss SCSI error handling routines operate with interrupts
enabled, we no longer need to maintain the list of command completions that
sendcmd() might inadvertantly scoop up, since now it only runs at driver init
time, and there won't be any other commands for it to scoop up. So we
can remove that list and the code that adds to it and processes it.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Change cciss scsi error handling routines to work with interrupts enabled.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Separate the error processing from sendcmd_withirq_core from the code
which retries commands. The rationale for this is that the SCSI error
handling code can then be made to use sendcmd_withirq_core, but avoid
retrying commands.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Factor out code to process target status of completed commands in sendcmd()
and sendcmd_withirq_core(), and fix problem that bad target status was ignored in
sendcmd_withirq_core.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Simplify interfaces of sendcmd() and sendcmd_withirq() so that they
provide only one way to address commands instead of three ways.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Factor the core of sendcmd_withirq out to provide a simpler interface
which provides access to full error information.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use schedule_timeout_uninterruptible instead of schedule_timeout in the
scsi error handling code when waiting between TUR polls since we are not
interested in nor want to be interrupted by signals.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch
seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes()
for a bidi request (req->next_rq). As a result, all bidi users are
broken.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Reworked by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds block-step support on powerpc, including a PTRACE_SINGLEBLOCK
request for ptrace.
The BookE implementation is tweaked to fire a single step after a
block step in order to mimmic the server behaviour.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds some descriptions of lists and structures.
This patch contains no code changes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
TOMOYO 2.2.0 is not using total_len field of "struct tomoyo_path_info".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
Furthermore, it twiddles with the details of SKB list handling
directly, which we're trying to eliminate.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to ibmvscsi for the capabilities MAD. This command gets sent
to the Virtual I/O server prior to login in order to communicate client
capabilities. Additionally it returns information regarding capabilities
that the server supports. The two main capabilities communicated in this
MAD are related to partition migration and client reserve. Client reserve
allows for SCSI-2 reservations to be sent to virtual disks which are backed
by physical LUNs and will result in the reservation being sent to the
physical LUN.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
A new mode of error reporting, fast fail, has been added to the VIOS
which allows failover to happen more quickly.
If this new fast fail mode is enabled on the VIOS and the vSCSI client
supports the mode, the VIOS will not return MEDIUM error on path failures,
but rather return VIOSRP_ADAPTER_FAIL in the crq response, which
ibmvscsi will translate to DID_ERROR.
This new mode can be enabled for single path configurations as well,
so it is the new default error reporting mode. A module parameter is
provided to disable this new behavior on the off chance it causes a
problem on some old VIOS version.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ibmvscsi driver currently sends the SRP Login before sending the Adapter
Info MAD, which can result in commands getting sent to the virtual adapter
before we are ready for them. This results in a slight window where the target
devices may not behave as expected. Change the order and close the window.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Previously we had one timeout that was used for all types of operations.
This adds specific timeout values for different operations (init, login,
adapter info MAD, abort task, and LUN reset).
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Adds support for 16 byte CDBs to the ibmvscsi driver.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
On Sun, 7 Jun 2009, Ingo Molnar wrote:
> Testing tracer sched_switch: <6>Starting ring buffer hammer
> PASSED
> Testing tracer sysprof: PASSED
> Testing tracer function: PASSED
> Testing tracer irqsoff:
> =============================================
> PASSED
> Testing tracer preemptoff: PASSED
> Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
> PASSED
> Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
> ---------------------------------------------
> rb_consumer/431 is trying to acquire lock:
> (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
>
> but task is already holding lock:
> (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
>
> other info that might help us debug this:
> 1 lock held by rb_consumer/431:
> #0: (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
The ring buffer is a generic structure, and can be used outside of
ftrace. If ftrace traces within the use of the ring buffer, it can produce
false positives with lockdep.
This patch passes in a static lock key into the allocation of the ring
buffer, so that different ring buffers will have their own lock class.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1244477919.13761.9042.camel@twins>
[ store key in ring buffer descriptor ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The rule is:
- high overhead: red
- mid overhead: green
- low overhead: normal (white/black)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for profiling JIT generated code to 'perf
report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
symbols map that is parsed when looking and displaying symbols.
Thanks to Peter Zijlstra for his help with this patch!
Example "perf report" output with the Jato JIT:
#
# (40311 samples)
#
# Overhead Command Shared Object Symbol
# ........ ................ ......................... ......
#
97.80% jato /tmp/perf-11915.map [.] Fibonacci.fib(I)I
0.56% jato 00000000b7fa023b 0x000000b7fa023b
0.45% jato /tmp/perf-11915.map [.] Fibonacci.main([Ljava/lang/String;)V
0.38% jato [kernel] [k] get_page_from_freelist
0.06% jato [kernel] [k] kunmap_atomic
0.05% jato ./jato [.] utf8Hash
0.04% jato ./jato [.] executeJava
0.04% jato ./jato [.] defineClass
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: a.p.zijlstra@chello.nl
Cc: acme@redhat.com
LKML-Reference: <Pine.LNX.4.64.0906082111590.12407@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some JIT compilers allocate memory for generated code with
posix_memalign() + mprotect() so we need to hook into mprotect()
to make sure 'perf' is aware that we're executing code in
anonymous memory.
[ penberg@cs.helsinki.fi: move the hook to sys_mprotect() ]
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <Pine.LNX.4.64.0906082111030.12407@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fill in amd_hw_cache_event_id[] with the AMD CPU specific events,
for family 0x0f, 0x10 and 0x11.
There's apparently no distinction between load and store events, so
we only fill in the load events.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In ide_probe_port() skip probe if ide_port_wait_ready() returns -ENODEV
and print error message instead of debug one if it returns -EBUSY.
v2:
Fix the default 'rc' value.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Ensure MPS remains in synchronization across all NIC/FCoE
functions after a reset.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Queued work processing will now be serialized with its own
lower-priority spinlock. This also simplifies the work-queue
interface for future work-queue consumers.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
As firmware will ultimately terminate (stop) and port
states-cleared.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
ISP24xx and above must query the host-status register, not HCCR.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
ISP24xx and above ISPs perform a RISC reset in
qla24xx_reset_chip(), which is called prior to
qla24xx_chip_diag().
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
With RSCN states not being kept across qla2x00_configure_loop()
invocations, loop-resync distruptions during fabric-discovery may
cause ports to remain in a lost state. Force state
renegotiation during a follow-on configure-loop iteration.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Unlike earlier ISPs, recent ISPs (ISP81xx) can in fact fail this
mailbox command.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>