According to commit 5e3d20a68f
(init: Remove the BKL from startup code) these sparse notations
should be removed also.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
According to Jens, this code does not need the BKL at all,
it is sufficiently serialized by bd_mutex.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
The big kernel lock is going away, so make sure
that if it is disabled by Kconfig, we do not
try to validate it, which would result in
compile errors.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
The dvb core only uses the big kernel lock in the open
and ioctl functions, which means it can be replaced with
a dvb specific mutex. Fortunately, all the ioctl functions
go through dvb_usercopy, so we can move the serialization
in there.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: linux-media@vger.kernel.org
The bt8xx driver only uses the big kernel lock in its dst_ca_ioctl
function and never to serialize against other code, so we can
trivially replace it with a private mutex.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-media@vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
This driver already has a global mutex, so let's just
use that in the open function instead of the BKL.
It may not even be needed there, but this patch should
have the smallest impact.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Mark Gross <mark.gross@intel.com>
RAW_SETBIND and RAW_GETBIND 32bit versions are fscked in interesting ways.
1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by
HANDLE_IOCTL(RAW_SETBIND, raw_ioctl). The latter is ignored.
2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64
and layouts on i386 and amd64 are _not_ the same. raw_ioctl() would
work there, but it's never called due to (1). As it is, i386 /sbin/raw
definitely doesn't work on amd64 boxen.
3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64,
which would be rather sad, seeing that normal userland there is 32bit.
The thing is, slapping __packed on the struct in question does not DTRT -
it eliminates *all* padding. The real solution is to use compat_u64.
4) of course, all that stuff has no business being outside of raw.c in the
first place - there should be ->compat_ioctl() for /dev/rawctl instead of
messing with compat_ioctl.c.
[akpm@linux-foundation.org: coding-style fixes]
[arnd@arndb.de: port to 2.6.36]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Three uml device drivers still use the big kernel lock,
but all of them can be safely converted to using
a per-driver mutex instead. Most likely this is not
even necessary, so after further review these can
and should be removed as well.
The exec system call no longer requires the BKL either,
so remove it from there, too.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Commit a25909a4 (lockdep: Add an in_workqueue_context() lockdep-based
test function) added in_workqueue_context() but there hasn't been any
in-kernel user and the lockdep annotation in workqueue is scheduled to
change. Remove the unused function.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The documentation for schedule_on_each_cpu() states that it calls a
function on each online CPU from keventd. This can easily be
interpreted as an asyncronous call because the description does not
mention that flush_work is called. Clarify that it is synchronous.
tj: rephrased a bit
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Windows may leave pin power-down registers set after reboot, and
this resulted in muted output on Linux. Reset these registers
at initialization properly.
Signed-off-by: Charles Chin <Charles.Chin@idt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous.
There is no reason to call flush_scheduled_work() after
lru_add_drain_all(). Drop the spurious calls.
This is to prepare for the deprecation and removal of
flush_scheduled_work().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
IPv6 encapsulation uses a bad source address for the tunnel.
i.e. VIP will be used as local-addr and encap. dst addr.
Decapsulation will not accept this.
Example
LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
(eth0 2003::1:0:1/96)
RS (ethX 2003::1:0:5/96)
tcpdump
2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0
In Linux IPv6 impl. you can't have a tunnel with an any cast address
receiving packets (I have not tried to interpret RFC 2473)
To have receive capabilities the tunnel must have:
- Local address set as multicast addr or an unicast addr
- Remote address set as an unicast addr.
- Loop back addres or Link local address are not allowed.
This causes us to setup a tunnel in the Real Server with the
LVS as the remote address, here you can't use the VIP address since it's
used inside the tunnel.
Solution
Use outgoing interface IPv6 address (match against the destination).
i.e. use ip6_route_output() to look up the route cache and
then use ipv6_dev_get_saddr(...) to set the source address of the
encapsulated packet.
Additionally, cache the results in new destination
fields: dst_cookie and dst_saddr and properly check the
returned dst from ip6_route_output. We now add xfrm_lookup
call only for the tunneling method where the source address
is a local one.
Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch allows to listen to events that inform about
expectations destroyed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
cciss: fix PCI IDs for new controllers
This patch fixes the botched up PCI IDs of new controllers. Please consider
this patch for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
On a system that support intr-rempping when booting with "intremap=off"
[ 177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[ 177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0
...
[ 178.173326] Call Trace:
[ 178.173574] [<ffffffff810515b4>] destroy_irq+0x3a/0x75
[ 178.192934] [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10
[ 178.193418] [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f
[ 178.213021] [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb
Call free_irte only when interrupt remapping is enabled.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CBCB274.7010108@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit c3f00c70 ("perf: Separate find_get_context() from event
initialization") changed the generic perf_event code to call
perf_event_alloc, which calls the arch-specific event_init code,
before looking up the context for the new event. Unfortunately,
power_pmu_event_init uses event->ctx->task to see whether the
new event is a per-task event or a system-wide event, and thus
crashes since event->ctx is NULL at the point where
power_pmu_event_init gets called.
(The reason it needs to know whether it is a per-task event is
because there are some hardware events on Power systems which
only count when the processor is not idle, and there are some
fixed-function counters which count such events. For example,
the "run cycles" event counts cycles when the processor is not
idle. If the user asks to count cycles, we can use "run cycles"
if this is a per-task event, since the processor is running when
the task is running, by definition. We can't use "run cycles"
if the user asks for "cycles" on a system-wide counter.)
Fortunately the information we need is in the
event->attach_state field, so we just use that instead.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101019055535.GA10398@drongo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.
| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------
The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.
When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
KBUILD_MODNAME normalizes "-" to "_". This is non-obvious and results in
the id name for ADP5588 being "adp5588_keys" while the other supported id
is "adp5587-keys". So avoid this define and use an explicit string as the
id name.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
There are several error paths in the code that do not unmap DMA. This
patch adds calls to svc_rdma_unmap_dma to free these DMA contexts.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There was logic in the send path that assumed that a page containing data
to send to the client has a KVA. This is not always the case and can result
in data corruption when page_address returns zero and we end up DMA mapping
zero.
This patch changes the bus mapping logic to avoid page_address() where
necessary and converts all calls from ib_dma_map_single to ib_dma_map_page
in order to keep the map/unmap calls symmetric.
Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit f6765502f8 and adds
the missing include file.
Signed-off-by: Peter Hsiang <Peter.Hsiang@maxim-ic.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In this code, 0 is returned on failure, even though other
failures return -ENOMEM or other similar values.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@a@
identifier alloc;
identifier ret;
constant C;
expression x;
@@
x = alloc(...);
if (x == NULL) { <+... \(ret = -C; \| return -C; \) ...+> }
@@
identifier f, a.alloc;
expression ret;
expression x,e1,e2,e3;
@@
ret = 0
... when != ret = e1
*x = alloc(...)
... when != ret = e2
if (x == NULL) { ... when != ret = e3
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch fixes build error about GPIO address due to
conflict of commit 4d914705 and 19a2c065.
- commit 4d914705: Fix on GPIO base addresses
- commit 19a2c065: Moves initial map for merging S5P64X0
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
When DYNAMIC_FTRACE is enabled and we use the C version of recordmcount,
all objects are run through the recordmcount program to create a
separate section that stores all the callers of mcount.
The build process has a special file: scripts/mod/empty.o. This is
built from empty.c which is literally an empty file (except for a
single comment). This file is used to find information about the target
elf format, like endianness and word size.
The problem comes up when we need to build recordmcount. The
build process requires that empty.o is built first. The build rules
for empty.o will try to execute recordmcount on the empty.o file.
We get an error that recordmcount does not exist.
To avoid this recursion, the build file will skip running recordmcount
if the file that it is building is script/mod/empty.o.
[ extra comment Suggested-by: Sam Ravnborg <sam@ravnborg.org> ]
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
MIPS: Enable ISA_DMA_API config to fix build failure
MIPS: 32-bit: Fix build failure in asm/fcntl.h
MIPS: Remove all generated vmlinuz* files on "make clean"
MIPS: do_sigaltstack() expects userland pointers
MIPS: Fix error values in case of bad_stack
MIPS: Sanitize restart logics
MIPS: secure_computing, syscall audit: syscall number should in r2, not r0.
MIPS: Don't block signals if we'd failed to setup a sigframe
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.
(Ported to current XFS code by <aelder@sgi.com>.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
This patch reverts the driver to enabling/disabling the NFC interrupt
mask rather than enabling/disabling the system interrupt. This cleans
up the driver so that it doesn't rely on interrupts being disabled
within the interrupt handler.
For i.MX21 we keep the current behaviour, that is calling
enable_irq/disable_irq_nosync to enable/disable interrupts. This patch
is based on earlier work by John Ogness.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: John Ogness <john.ogness@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds support for 32bit project quota identifiers.
On disk format is backward compatible with 16bit projid numbers. projid
on disk is now kept in two 16bit values - di_projid_lo (which holds the
same position as old 16bit projid value) and new di_projid_hi (takes
existing padding) and converts from/to 32bit value on the fly.
xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used
to enable PROJID32BIT support.
Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Stop having two different names for many buffer functions and use
the more descriptive xfs_buf_* names directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
We're not actually passing around credentials inside XFS for a while
now, so remove all xfs_cred.h with it's cred_t typedef and all
instances of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
This header only provides one extern that isn't actually declared
anywhere, and shadowed by a macro.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
It used to have a place when it contained an automatically generated
CVS version, but these days it's entirely superflous.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Use the correct prototype for xfs_trans_committed instead of casting it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
These days inode64 should only control which AGs we allocate new
inodes from, while we still try to support reading all existing
inodes. To make this actually work the check ontop of xfs_iget
needs to be relaxed to allow inodes in all allocation groups instead
of just those that we allow allocating inodes from. Note that we
can't simply remove the check - it prevents us from accessing
invalid data when fed invalid inode numbers from NFS or bulkstat.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb
and remove support for per-cpu counters from xfs_mod_incore_sb_batch
to simplify it. And added benefit is that we don't have to take
m_sb_lock for transactions that only modify per-cpu counters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters. Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Fail the mount if we can't allocate memory for the per-CPU counters.
This is consistent with how we handle everything else in the mount
path and makes the superblock counter modification a lot simpler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
pahole reports the struct xfs_buf has quite a few holes in it, so
packing the structure better will reduce the size of it by 16 bytes.
Also, move all the fields used in cache lookups into the first
cacheline.
Before on x86_64:
/* size: 320, cachelines: 5 */
/* sum members: 298, holes: 6, sum holes: 22 */
After on x86_64:
/* size: 304, cachelines: 5 */
/* padding: 6 */
/* last cacheline: 48 bytes */
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
The buffer cache hash is showing typical hash scalability problems.
In large scale testing the number of cached items growing far larger
than the hash can efficiently handle. Hence we need to move to a
self-scaling cache indexing mechanism.
I have selected rbtrees for indexing becuse they can have O(log n)
search scalability, and insert and remove cost is not excessive,
even on large trees. Hence we should be able to cache large numbers
of buffers without incurring the excessive cache miss search
penalties that the hash is imposing on us.
To ensure we still have parallel access to the cache, we need
multiple trees. Rather than hashing the buffers by disk address to
select a tree, it seems more sensible to separate trees by typical
access patterns. Most operations use buffers from within a single AG
at a time, so rather than searching lots of different lists,
separate the buffer indexes out into per-AG rbtrees. This means that
searches during metadata operation have a much higher chance of
hitting cache resident nodes, and that updates of the tree are less
likely to disturb trees being accessed on other CPUs doing
independent operations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.
Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.
To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>