Commit graph

220789 commits

Author SHA1 Message Date
Richard Weinberger
6915e04f88 um: remove PAGE_SIZE alignment in linker script causing kernel segfault.
The linker script cleanup that I did in commit 5d150a97f9 ("um: Clean up
linker script using standard macros.") (2.6.32) accidentally introduced an
ALIGN(PAGE_SIZE) when converting to use INIT_TEXT_SECTION; Richard
Weinberger reported that this causes the kernel to segfault with
CONFIG_STATIC_LINK=y.

I'm not certain why this extra alignment is a problem, but it seems likely
it is because previously

__init_begin = _stext = _text = _sinittext

and with the extra ALIGN(PAGE_SIZE), _sinittext becomes different from the
rest.  So there is likely a bug here where something is assuming that
_sinittext is the same as one of those other symbols.  But reverting the
accidental change fixes the regression, so it seems worth committing that
now.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Tested by: Antoine Martin <antoine@nagafix.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:04 -07:00
Robin Holt
09358972bf sgi-xp: incoming XPC channel messages can come in after the channel's partition structures have been torn down
Under some workloads, some channel messages have been observed being
delayed on the sending side past the point where the receiving side has
been able to tear down its partition structures.

This condition is already detected in xpc_handle_activate_IRQ_uv(), but
that information is not given to xpc_handle_activate_mq_msg_uv().  As a
result, xpc_handle_activate_mq_msg_uv() assumes the structures still exist
and references them, causing a NULL-pointer deref.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:03 -07:00
Richard Weinberger
482db6df17 um: fix global timer issue when using CONFIG_NO_HZ
This fixes a issue which was introduced by fe2cc53e ("uml: track and make
up lost ticks").

timeval_to_ns() returns long long and not int.  Due to that UML's timer
did not work properlt and caused timer freezes.

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:03 -07:00
Mel Gorman
b7f50cfa36 mm, page-allocator: do not check the state of a non-existant buddy during free
There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation
in buddy allocator by adding buddies that are merging to the tail of the
free lists") that means a buddy at order MAX_ORDER is checked for merging.
 A page of this order never exists so at times, an effectively random
piece of memory is being checked.

Alan Curry has reported that this is causing memory corruption in
userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32).
It is not clear why this is happening.  It could be a cache coherency
problem where pages mapped in both user and kernel space are getting
different cache lines due to the bad read from kernel space
(http://lkml.org/lkml/2010/10/13/179).  It could also be that there are
some special registers being io-remapped at the end of the memmap array
and that a read has special meaning on them.  Compiler bugs have been
ruled out because the assembly before and after the patch looks relatively
harmless.

This patch fixes the problem by ensuring we are not reading a possibly
invalid location of memory.  It's not clear why the read causes corruption
but one way or the other it is a buggy read.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Reported-by: Alan Curry <pacman@kosh.dhis.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:03 -07:00
Andrew Morton
a75d377686 types.h: move misplaced comment
This comment landed in the wrong place.

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:03 -07:00
KAMEZAWA Hiroyuki
f8f72ad539 mm: fix return value of scan_lru_pages in memory unplug
scan_lru_pages returns pfn. So, it's type should be "unsigned long"
not "int".

Note: I guess this has been work until now because memory hotplug tester's
      machine has not very big memory....
      physical address < 32bit << PAGE_SHIFT.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:03 -07:00
Roland Dreier
116e9535fe Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next 2010-10-26 16:09:11 -07:00
Jason Gunthorpe
2ca78d23a7 IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
Clean up properly if pci_set_consistent_dma_mask() fails.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Ralph Campbell
5d26a1df23 IB/qib: Allow driver to load if PCIe AER fails
Some PCIe root complex chip sets don't support advanced error reporting.
Allow the driver to load OK if pci_enable_pcie_error_reporting() fails.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Ralph Campbell
9e43e0106d IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer
"dd" is uninitialized.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Jason Gunthorpe
82fdb0ab54 IB/qib: Fix extra log level in qib_early_err()
Noticed this odd looking thing in dmesg:

    ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5

which is due to a bad use of dev_info.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Bjorn Helgaas
1af3c2e45e x86: allocate space within a region top-down
Request that allocate_resource() use available space from high addresses
first, rather than the default of using low addresses first.

The most common place this makes a difference is when we move or assign
new PCI device resources.  Low addresses are generally scarce, so it's
better to use high addresses when possible.  This follows Windows practice
for PCI allocation.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:45 -07:00
Bjorn Helgaas
419afdf53c x86: update iomem_resource end based on CPU physical address capabilities
The iomem_resource map reflects the available physical address space.
We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but
of course we can only use as much as the CPU can address.

This patch updates the end based on the CPU capabilities, so we don't
mistakenly allocate space that isn't usable, as we're likely to do when
allocating from the top-down.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:44 -07:00
Bjorn Helgaas
dc9887dc02 x86/PCI: allocate space from the end of a region, not the beginning
Allocate from the end of a region, not the beginning.

For example, if we need to allocate 0x800 bytes for a device on bus
0000:00 given these resources:

    [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02

the available space at [mem 0xbff00000-0xbfffffff] is passed to the
alignment callback (pcibios_align_resource()).  Prior to this patch, we
would put the new 0x800 byte resource at the beginning of that available
space, i.e., at [mem 0xbff00000-0xbff007ff].

With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff].

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:42 -07:00
Bjorn Helgaas
b126b4703a PCI: allocate bus resources from the top down
Allocate space from the highest-address PCI bus resource first, then work
downward.

Previously, we looked for space in PCI host bridge windows in the order
we discovered the windows.  For example, given the following windows
(discovered via an ACPI _CRS method):

    pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
    pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
    pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
    pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
    pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]

we attempted to allocate from [mem 0x000a0000-0x000bffff] first, then
[mem 0x000c0000-0x000effff], and so on.

With this patch, we allocate from [mem 0xff980000-0xff980fff] first, then
[mem 0xff97c000-0xff97ffff], [mem 0xfed20000-0xfed9ffff], etc.

Allocating top-down follows Windows practice, so we're less likely to
trip over BIOS defects in the _CRS description.

On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
doesn't actually work and is likely a BIOS defect.  The symptom is that we
move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c43
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=620313
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=629933
Reported-by: Brian Bloniarz <phunge0@hotmail.com>
Reported-and-tested-by: Stefan Becker <chemobejk@gmail.com>
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:37 -07:00
Bjorn Helgaas
e7f8567db9 resources: support allocating space within a region from the top down
Allocate space from the top of a region first, then work downward,
if an architecture desires this.

When we allocate space from a resource, we look for gaps between children
of the resource.  Previously, we always looked at gaps from the bottom up.
For example, given this:

    [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
      [mem 0xbff00000-0xbfffffff] gap -- available
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
      [mem 0xe0000000-0xf7ffffff] gap -- available

we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
then the [mem 0xe0000000-0xf7ffffff] gap.

With this patch an architecture can choose to allocate from the top gap
[mem 0xe0000000-0xf7ffffff] first.

We can't do this across the board because iomem_resource.end is initialized
to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't
address the entire 64-bit physical address space.  Therefore, we only
allocate top-down if the arch requests it by clearing
"resource_alloc_from_bottom".

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:31 -07:00
Bjorn Helgaas
a1862e3107 resources: handle overflow when aligning start of available area
If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would
make us think there's more available space than there really is.  We
would likely return something that conflicts with a previous resource,
which would cause a failure when allocate_resource() requests the newly-
allocated region.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027
Reported-by: Fabrice Bellet <fabrice@bellet.info>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:28 -07:00
Bjorn Helgaas
6909ba14c2 resources: ensure callback doesn't allocate outside available space
The alignment callback returns a proposed location, which may have been
adjusted to avoid ISA aliases or for other architecture-specific reasons.

We already had a check ("tmp.start < tmp.end") to make sure the callback
doesn't return an area that extends past the available area.  This patch
reworks the check to make sure it doesn't return an area that extends
either below or above the available area.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:26 -07:00
Bjorn Helgaas
5d6b1fa301 resources: factor out resource_clip() to simplify find_resource()
This factors out the min/max clipping to simplify find_resource().
No functional change.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:24 -07:00
Bjorn Helgaas
a9cea01741 resources: add a default alignf to simplify find_resource()
This removes a test from find_resource(), which is getting cluttered.
No functional change.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:22 -07:00
Joe Perches
aa1ad26089 RDMA/cxgb4: Remove unnecessary KERN_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 13:45:59 -07:00
Joe Perches
ca7cf94f8b RDMA/cxgb3: Remove unnecessary KERN_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 13:45:49 -07:00
Len Brown
c25d29952b intel_idle: do not use the LAPIC timer for ATOM C2
If we use the LAPIC timer during ATOM C2 on
some nvidia chisets, the system stalls.

https://bugzilla.kernel.org/show_bug.cgi?id=21032

Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26 15:45:17 -04:00
Glenn Wurster
7a876b0efc IPv6: Temp addresses are immediately deleted.
There is a bug in the interaction between ipv6_create_tempaddr and
addrconf_verify.  Because ipv6_create_tempaddr uses the cstamp and tstamp
from the public address in creating a private address, if we have not
received a router advertisement in a while, tstamp + temp_valid_lft might be
< now.  If this happens, the new address is created inside
ipv6_create_tempaddr, then the loop within addrconf_verify starts again and
the address is immediately deleted.  We are left with no temporary addresses
on the interface, and no more will be created until the public IP address is
updated.  To avoid this, set the expiry time to be the minimum of the time
left on the public address or the config option PLUS the current age of the
public interface.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:13 -07:00
Glenn Wurster
aed65501e8 IPv6: Create temporary address if none exists.
If privacy extentions are enabled, but no current temporary address exists,
then create one when we get a router advertisement.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:12 -07:00
Stefano Stabellini
ea5b8f7393 xen: initialize cpu masks for pv guests in xen_smp_init
Pv guests don't have ACPI and need the cpu masks to be set
correctly as early as possible so we call xen_fill_possible_map from
xen_smp_init.
On the other hand the initial domain supports ACPI so in this case we skip
xen_fill_possible_map and rely on it. However Xen might limit the number
of cpus usable by the domain, so we filter those masks during smp
initialization using the VCPUOP_is_up hypercall.
It is important that the filtering is done before
xen_setup_vcpu_info_placement.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-26 20:33:15 +01:00
Liam Girdwood
e94be5f362 ASoC: fsl - fix build error in pcm030-audio-fabric.c
Fix build error:-

sound/soc/fsl/pcm030-audio-fabric.c:27:33: fatal error:
sound/soc-of-simple.h: No such file or directory

Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-10-26 21:28:56 +02:00
Julia Lawall
3342b9680f sound/oss/sb_ess.c: delete double assignment
Delete successive assignments to the same location.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression i;
@@

*i = ...;
 i = ...;
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-10-26 21:28:05 +02:00
Arnaldo Carvalho de Melo
00204c3396 perf python scripting: Add futex-contention script
The equivalent to this SystemTAP script:

http://sourceware.org/systemtap/wiki/WSFutexContention

[root@doppio ~]# perf trace futex-contention
Press control+C to stop and show the summary
^Cnpviewer.bin[15242] lock 7f0a8be19104 contended 29 times, 72806 avg ns
npviewer.bin[15242] lock 7f0a8be19130 contended 2 times, 1355 avg ns
synergyc[17245] lock f127f4 contended 1 times, 1830569 avg ns
firefox[15116] lock 7f2b7238af0c contended 168 times, 1230390 avg ns
synergyc[17245] lock f2fc20 contended 1 times, 33149 avg ns
npviewer.bin[15255] lock 7f0a8be19074 contended 155 times, 73047 avg ns
npviewer.bin[15255] lock 7f0a8be190a0 contended 127 times, 7088 avg ns
synergyc[17247] lock f12854 contended 1 times, 46741 avg ns
synergyc[17245] lock f12610 contended 1 times, 7358 avg ns
[root@doppio ~]#

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-10-26 17:07:33 -02:00
Len Brown
7e31842441 Merge branch 'misc' into release 2010-10-26 14:51:00 -04:00
Len Brown
1bd64d42ab Merge branch 'acpi-mmio' into release
Conflicts:
	drivers/acpi/osl.c

Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26 14:50:56 -04:00
Eric Dumazet
ded85aa86b fib_hash: fix rcu sparse and logical errors
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to
fz->fz_hash for real.

-	&fz->fz_hash[fn_hash(f->fn_key, fz)]
+	rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:39 -07:00
Eric Dumazet
ebb9fed2de fib: fix fib_nl_newrule()
Some panic reports in fib_rules_lookup() show a rule could have a NULL
pointer as a next pointer in the rules_list.

This can actually happen because of a bug in fib_nl_newrule() : It
checks if current rule is the destination of unresolved gotos. (Other
rules have gotos to this about to be inserted rule)

Problem is it does the resolution of the gotos before the rule is
inserted in the rules_list (and has a valid next pointer)

Fix this by moving the rules_list insertion before the changes on gotos.

A lockless reader can not any more follow a ctarget pointer, unless
destination is ready (has a valid next pointer)

Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru>
Reported-by: Joe Buehler <aspam@cox.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:38 -07:00
Linus Torvalds
f9ba5375a8 Merge branch 'ima-memory-use-fixes'
* ima-memory-use-fixes:
  IMA: fix the ToMToU logic
  IMA: explicit IMA i_flag to remove global lock on inode_delete
  IMA: drop refcnt from ima_iint_cache since it isn't needed
  IMA: only allocate iint when needed
  IMA: move read counter into struct inode
  IMA: use i_writecount rather than a private counter
  IMA: use inode->i_lock to protect read and write counters
  IMA: convert internal flags from long to char
  IMA: use unsigned int instead of long for counters
  IMA: drop the inode opencount since it isn't needed for operation
  IMA: use rbtree instead of radix tree for inode information cache
2010-10-26 11:37:48 -07:00
Eric Paris
bade72d607 IMA: fix the ToMToU logic
Current logic looks like this:

        rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK);
        if (rc < 0)
                goto out;

        if (mode & FMODE_WRITE) {
                if (inode->i_readcount)
                        send_tomtou = true;
                goto out;
        }

        if (atomic_read(&inode->i_writecount) > 0)
                send_writers = true;

Lets assume we have a policy which states that all files opened for read
by root must be measured.

Lets assume the file has permissions 777.

Lets assume that root has the given file open for read.

Lets assume that a non-root process opens the file write.

The non-root process will get to ima_counts_get() and will check the
ima_must_measure().  Since it is not supposed to measure it will goto
out.

We should check the i_readcount no matter what since we might be causing
a ToMToU voilation!

This is close to correct, but still not quite perfect.  The situation
could have been that root, which was interested in the mesurement opened
and closed the file and another process which is not interested in the
measurement is the one holding the i_readcount ATM.  This is just overly
strict on ToMToU violations, which is better than not strict enough...

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:19 -07:00
Eric Paris
196f518128 IMA: explicit IMA i_flag to remove global lock on inode_delete
Currently for every removed inode IMA must take a global lock and search
the IMA rbtree looking for an associated integrity structure.  Instead
we explicitly mark an inode when we add an integrity structure so we
only have to take the global lock and do the removal if it exists.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:19 -07:00
Eric Paris
64c62f06be IMA: drop refcnt from ima_iint_cache since it isn't needed
Since finding a struct ima_iint_cache requires a valid struct inode, and
the struct ima_iint_cache is supposed to have the same lifetime as a
struct inode (technically they die together but don't need to be created
at the same time) we don't have to worry about the ima_iint_cache
outliving or dieing before the inode.  So the refcnt isn't useful.  Just
get rid of it and free the structure when the inode is freed.

Signed-off-by: Eric Paris <eapris@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:19 -07:00
Eric Paris
bc7d2a3e66 IMA: only allocate iint when needed
IMA always allocates an integrity structure to hold information about
every inode, but only needed this structure to track the number of
readers and writers currently accessing a given inode.  Since that
information was moved into struct inode instead of the integrity struct
this patch stops allocating the integrity stucture until it is needed.
Thus greatly reducing memory usage.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
a178d2027d IMA: move read counter into struct inode
IMA currently allocated an inode integrity structure for every inode in
core.  This stucture is about 120 bytes long.  Most files however
(especially on a system which doesn't make use of IMA) will never need
any of this space.  The problem is that if IMA is enabled we need to
know information about the number of readers and the number of writers
for every inode on the box.  At the moment we collect that information
in the per inode iint structure and waste the rest of the space.  This
patch moves those counters into the struct inode so we can eventually
stop allocating an IMA integrity structure except when absolutely
needed.

This patch does the minimum needed to move the location of the data.
Further cleanups, especially the location of counter updates, may still
be possible.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
b9593d309d IMA: use i_writecount rather than a private counter
IMA tracks the number of struct files which are holding a given inode
readonly and the number which are holding the inode write or r/w.  It
needs this information so when a new reader or writer comes in it can
tell if this new file will be able to invalidate results it already made
about existing files.

aka if a task is holding a struct file open RO, IMA measured the file
and recorded those measurements and then a task opens the file RW IMA
needs to note in the logs that the old measurement may not be correct.
It's called a "Time of Measure Time of Use" (ToMToU) issue.  The same is
true is a RO file is opened to an inode which has an open writer.  We
cannot, with any validity, measure the file in question since it could
be changing.

This patch attempts to use the i_writecount field to track writers.  The
i_writecount field actually embeds more information in it's value than
IMA needs but it should work for our purposes and allow us to shrink the
struct inode even more.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
ad16ad00c3 IMA: use inode->i_lock to protect read and write counters
Currently IMA used the iint->mutex to protect the i_readcount and
i_writecount.  This patch uses the inode->i_lock since we are going to
start using in inode objects and that is the most appropriate lock.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
15aac67677 IMA: convert internal flags from long to char
The IMA flags is an unsigned long but there is only 1 flag defined.
Lets save a little space and make it a char.  This packs nicely next to
the array of u8's.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
497f323370 IMA: use unsigned int instead of long for counters
Currently IMA uses 2 longs in struct inode.  To save space (and as it
seems impossible to overflow 32 bits) we switch these to unsigned int.
The switch to unsigned does require slightly different checks for
underflow, but it isn't complex.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Eric Paris
b575156daf IMA: drop the inode opencount since it isn't needed for operation
The opencount was used to help debugging to make sure that everything
which created a struct file also correctly made the IMA calls.  Since we
moved all of that into the VFS this isn't as necessary.  We should be
able to get the same amount of debugging out of just the reader and
write count.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:17 -07:00
Eric Paris
8549164143 IMA: use rbtree instead of radix tree for inode information cache
The IMA code needs to store the number of tasks which have an open fd
granting permission to write a file even when IMA is not in use.  It
needs this information in order to be enabled at a later point in time
without losing it's integrity garantees.

At the moment that means we store a little bit of data about every inode
in a cache.  We use a radix tree key'd on the inode's memory address.
Dave Chinner pointed out that a radix tree is a terrible data structure
for such a sparse key space.  This patch switches to using an rbtree
which should be more efficient.

Bug report from Dave:

 "I just noticed that slabtop was reporting an awfully high usage of
  radix tree nodes:

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
  4200331 2778082  66%    0.55K 144839       29   2317424K radix_tree_node
  2321500 2060290  88%    1.00K  72581       32   2322592K xfs_inode
  2235648 2069791  92%    0.12K  69864       32    279456K iint_cache

  That is, 2.7M radix tree nodes are allocated, and the cache itself is
  consuming 2.3GB of RAM.  I know that the XFS inodei caches are indexed
  by radix tree node, but for 2 million cached inodes that would mean a
  density of 1 inode per radix tree node, which for a system with 16M
  inodes in the filsystems is an impossibly low density.  The worst I've
  seen in a production system like kernel.org is about 20-25% density,
  which would mean about 150-200k radix tree nodes for that many inodes.
  So it's not the inode cache.

  So I looked up what the iint_cache was.  It appears to used for
  storing per-inode IMA information, and uses a radix tree for indexing.
  It uses the *address* of the struct inode as the indexing key.  That
  means the key space is extremely sparse - for XFS the struct inode
  addresses are approximately 1000 bytes apart, which means the closest
  the radix tree index keys get is ~1000.  Which means that there is a
  single entry per radix tree leaf node, so the radix tree is using
  roughly 550 bytes for every 120byte structure being cached.  For the
  above example, it's probably wasting close to 1GB of RAM...."

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:17 -07:00
Joe Perches
1941bf8c8d drivers/atm/eni.c: Remove multiple uses of KERN_<level>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:35:37 -07:00
Rafael J. Wysocki
f2dc0d1809 tg3: Do not call device_set_wakeup_enable() under spin_lock_bh
The tg3 driver calls device_set_wakeup_enable() under spin_lock_bh,
which causes a problem to happen after the recent core power
management changes, because this function can sleep now.  Fix this
by moving the device_set_wakeup_enable() call out of the
spin_lock_bh-protected area.

Reported-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:34:09 -07:00
David S. Miller
78fd9c4491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-10-26 11:32:28 -07:00
Bryan Schumaker
eb1c86b8b5 NFS: rename nfs.upcall -> nfs.idmap
This patch renames the idmapper upcall program from nfs.upcall to nfs.idmap in
the NFS documentation.  This is because the program has been renamed in the
nfs-utils source.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:57:10 -04:00
Trond Myklebust
036a107597 NFS: Fix a compile issue in nfs_root
Stephen Rothwell reports:

> /home/test/linux-2.6/fs/nfs/nfsroot.c: In function 'nfs_root_debug':
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: error: 'nfs_debug'
> undeclared (first use in this function)
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: note: each undeclared
> identifier is reported only once for each function it appears in
> make[3]: *** [fs/nfs/nfsroot.o] Error 1
> make[2]: *** [fs/nfs] Error 2
> make[1]: *** [fs] Error 2
> make: *** [sub-make] Error 2

Which is caused by commit 306a075362
(NFS: Allow NFSROOT debugging messages to be enabled dynamically)

Fix is to disable this code when RPC_DEBUG is disabled.

Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:56:42 -04:00