Commit graph

129283 commits

Author SHA1 Message Date
Theodore Ts'o
40a1984d22 jbd2: Submit writes to the journal using WRITE_SYNC
Since we will be waiting the write of the commit record to the journal
to complete in journal_submit_commit_record(), submit it using
WRITE_SYNC.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-04 19:55:57 -05:00
Linus Torvalds
fe0bdec68b Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  audit: validate comparison operations, store them in sane form
  clean up audit_rule_{add,del} a bit
  make sure that filterkey of task,always rules is reported
  audit rules ordering, part 2
  fixing audit rule ordering mess, part 1
  audit_update_lsm_rules() misses the audit_inode_hash[] ones
  sanitize audit_log_capset()
  sanitize audit_fd_pair()
  sanitize audit_mq_open()
  sanitize AUDIT_MQ_SENDRECV
  sanitize audit_mq_notify()
  sanitize audit_mq_getsetattr()
  sanitize audit_ipc_set_perm()
  sanitize audit_ipc_obj()
  sanitize audit_socketcall
  don't reallocate buffer in every audit_sockaddr()
2009-01-04 16:32:11 -08:00
Baruch Siach
22692018b9 enc28j60: fix RX buffer overflow
The enc28j60 driver doesn't check whether the length of the packet as reported 
by the hardware fits into the preallocated buffer. When stressed, the hardware 
may report insanely large packets even tough the "Receive OK" bit is set. Fix 
this.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:23:01 -08:00
Roel Kluin
fecc7036e7 isdn: capi: &&/|| typos
Correct two typos.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:22:04 -08:00
David Howells
14eaddc967 CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]
Fix a regression in cap_capable() due to:

	commit 5ff7711e635b32f0a1e558227d030c7e45b4a465
	Author: David Howells <dhowells@redhat.com>
	Date:   Wed Dec 31 02:52:28 2008 +0000

	    CRED: Differentiate objective and effective subjective credentials on a task

The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.

There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.

Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds.  However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.

One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.

The affected capability check is in generic_permission():

	if (!(mask & MAY_EXEC) || execute_ok(inode))
		if (capable(CAP_DAC_OVERRIDE))
			return 0;

This change splits capable() from has_capability() down into the commoncap and
SELinux code.  The capable() security op now only deals with the current
process, and uses the current process's subjective creds.  A new security op -
task_capable() - is introduced that can check any task's objective creds.

strictly the capable() security op is superfluous with the presence of the
task_capable() op, however it should be faster to call the capable() op since
two fewer arguments need be passed down through the various layers.

This can be tested by compiling the following program from the XFS testsuite:

/*
 *  t_access_root.c - trivial test program to show permission bug.
 *
 *  Written by Michael Kerrisk - copyright ownership not pursued.
 *  Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
 */
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>

#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"

static void
errExit(char *msg)
{
    perror(msg);
    exit(EXIT_FAILURE);
} /* errExit */

static void
accessTest(char *file, int mask, char *mstr)
{
    printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */

int
main(int argc, char *argv[])
{
    int fd, perm, uid, gid;
    char *testpath;
    char cmd[PATH_MAX + 20];

    testpath = (argc > 1) ? argv[1] : TESTPATH;
    perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM;
    uid = (argc > 3) ? atoi(argv[3]) : UID;
    gid = (argc > 4) ? atoi(argv[4]) : GID;

    unlink(testpath);

    fd = open(testpath, O_RDWR | O_CREAT, 0);
    if (fd == -1) errExit("open");

    if (fchown(fd, uid, gid) == -1) errExit("fchown");
    if (fchmod(fd, perm) == -1) errExit("fchmod");
    close(fd);

    snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
    system(cmd);

    if (seteuid(uid) == -1) errExit("seteuid");

    accessTest(testpath, 0, "0");
    accessTest(testpath, R_OK, "R_OK");
    accessTest(testpath, W_OK, "W_OK");
    accessTest(testpath, X_OK, "X_OK");
    accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
    accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
    accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
    accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");

    exit(EXIT_SUCCESS);
} /* main */

This can be run against an Ext3 filesystem as well as against an XFS
filesystem.  If successful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns 0
	access(/tmp/xxx, W_OK) returns 0
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns 0
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

If unsuccessful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns -1
	access(/tmp/xxx, W_OK) returns -1
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns -1
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

I've also tested the fix with the SELinux and syscalls LTP testsuites.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-01-05 11:17:04 +11:00
Herbert Xu
5d38a079ce gro: Add page frag support
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:40 -08:00
Herbert Xu
b530256d2e gro: Use gso_size to store MSS
In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself.  The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:19 -08:00
Jaswinder Singh Rajput
cfc3a44c3c starfire: use request_firmware()
Firmware blob is big endian

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:12:11 -08:00
Jaswinder Singh Rajput
077f849de4 firmware: convert tg3 driver to request_firmware()
Firmware blob looks like this...
        u8 firmware_major
        u8 firmware_minor
        u8 firmware_fix
        u8 pad
        __be32 start_address
        __be32 length (total, including BSS sections to be zeroed)
        data... (in __be32 words, which is native for the firmware)

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:11:25 -08:00
Jaswinder Singh
949b42544a firmware: convert acenic driver to request_firmware()
We store the firmware in its native big-endian form now, so the loop in
ace_copy() is modified to use be32_to_cpup() when writing it out.

We can forget the BSS,SBSS sections of the firmware, since we were
clearing all the device's RAM anyway. And the text,rodata,data sections
can all be loaded as a single chunk since they're contiguous (give or
take a few dozen bytes in between).

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:10:02 -08:00
David S. Miller
14deae4156 ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.
Thanks to excellent diagnosis by Eduard Guzovsky.

The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up.  If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.

IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).

So simply mimick this on the ipv6 side.

Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:04:39 -08:00
Sam Ravnborg
473321fc37 MAINTAINERS: update sparc maintainer
Reflect the current situation where David Miller
is the sparc maintainer.

I have tried to contact Bill on following adresses:

    wli@holomorphy.com
    wlirwin@us.ibm.com

with no success and Bill has not been active on the
sparclinux mailing list for a long time.

As sparc and sparc64 are unified I unified the two entries
in the MAINTAINERS file too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 15:47:49 -08:00
Sam Ravnborg
83c86984bf sparc: unify ipcbuf.h
The ony difference is the size of the mode.
sparc has extra padding to compensate for this.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 15:44:52 -08:00
Stefan Richter
c8a12d45d5 firewire: reorder struct fw_card for better cache efficiency
topology_map is by far the largest member in struct fw_card.  Move it to
the very end of the struct so that card pointer dereferences have better
chances to hit the CPU cache.

This requires to increase the topology_map backing store to the size
specified in IEEE 1394, i.e. 256 rather than 255 quadlets.  Otherwise
the topology_map response handler may access invalid memory.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:38 +01:00
Stefan Richter
d6f95a3d14 firewire: fix resetting of bus manager retry counter
An earlier change, maybe long ago, removed the copying of self_id_count
into card->self_id_count.  Since then each bus reset cleared
card->bm_retries even when it shouldn't.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:38 +01:00
Jay Fenlason
0fa1986f3a firewire: improve refcounting of fw_card
Take a reference to the card whenever fw_card_bm_work() is scheduled on
that card and release it when the work is done.  This allows us to
remove the cancel_delayed_work_sync() in fw_core_remove_card().

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (patch update)
2009-01-04 23:50:37 +01:00
Jay Fenlason
2cc489c213 firewire: typo in comment
Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:37 +01:00
Stefan Richter
d6053e08f5 firewire: fix small memory leak at module removal
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:37 +01:00
Stefan Richter
621f6dd715 firewire: fw-sbp2: remove unnecessary locking
What was I thinking when I added sbp2_set_generation()?  Its locking did
nothing (except for implicitly providing the necessary barrier between
node IDs update and generation update).

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:36 +01:00
Harvey Harrison
c82cdea1e1 ieee1934: dv1394: interrupt enabling/disabling broken on big-endian
After annotating the frame structs, this was left:
drivers/ieee1394/dv1394.c:2113:23: warning: invalid assignment: |=
drivers/ieee1394/dv1394.c:2113:23:    left side has type restricted __le32
drivers/ieee1394/dv1394.c:2113:23:    right side has type int
drivers/ieee1394/dv1394.c:2121:24: warning: invalid assignment: &=
drivers/ieee1394/dv1394.c:2121:24:    left side has type restricted __le32
drivers/ieee1394/dv1394.c:2121:24:    right side has type int
drivers/ieee1394/dv1394.c:2123:24: warning: invalid assignment: |=
drivers/ieee1394/dv1394.c:2123:24:    left side has type restricted __le32
drivers/ieee1394/dv1394.c:2123:24:    right side has type int

Which looks like a real bug on a big-endian arch as it would set/clear
the wrong bit.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>

Bill Fink writes:

I finally got a chance to test the patch on my kernel, and live DV
viewing using xine still worked fine.  Although I admit to being
mystified how it works both before and after the patch, since the
cpu_to_le32() calls that were added should result in byte swapping on
PPC that wasn't being done before.  I guess that either the code paths
involved aren't actually being triggered by my xine DV viewing, or
there's some fortuitous palindromic setting of bits.

Tested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:36 +01:00
Harvey Harrison
7d7039d365 ieee1394: dv1394: annotate frame input/output structs as little endian
No Functional changes.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:36 +01:00
Harvey Harrison
faf26bcc47 ieee1394: eth1394: trivial sparse annotations
Mostly annotations of ether_type as a be16.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:35 +01:00
Harvey Harrison
debb48063a ieee1394: mark bus_info_data as a __be32 array
Two access functions get_max_rom and set_hw_config_rom are
changed to take __be32 as well.  Only bus_info_data was
ever passed in so this is OK.  All other uses of bus_info_data
treated it as a be32 value already.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:35 +01:00
Harvey Harrison
a5e6f64dda ieee1394: replace CSR_SET_BUS_INFO_GENERATION macro
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:35 +01:00
Harvey Harrison
c816015860 ieee1394: pcilynx: trivial endian annotation
bus_info_block was treated as a be32 everywhere, annotate
as such.  Removes plenty of sparse warnings.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:34 +01:00
Stefan Richter
0bed181968 ieee1394: ignore nonzero Bus_Info_Block.max_rom, fetch config ROM in quadlets
It is already known that buggy firmwares exist which report a bogus
link_spd in their config ROM bus info block.  We now got the first
report of a bogus max_rom too (Freecom FireWire Hard Drive 1TB,
http://bugzilla.kernel.org/show_bug.cgi?id=12206).

I suspect other OSs only use quadlet reads to fetch the config ROM,
otherwise the firmware authors would have noticed their mistake.
Hence limit ieee1394's config ROM fetching routine to quadlets as the
safe minimum regardless of what the bus info block says.

This will potentially slow the bus reset handling by nodemgr somewhat
down.  But most existing devices support only quadlet reads anyway,
hence there will often be no actual difference to before this change.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:34 +01:00
Harvey Harrison
c1fc58d63d ieee1394: consolidate uses of IEEE1934_BUSID_MAGIC
Move the definition out of nodemgr.h and use it in csr.c/pcilynx.c

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:34 +01:00
Stefan Richter
cbe7dd699e ieee1394: ohci1394: flush MMIO writes before delay in initialization
and replace busy-wait by msleep.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:33 +01:00
Stefan Richter
9e234faf98 ieee1394: ohci1394: pass error codes from request_irq through
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:33 +01:00
Frans Pop
d1069aea68 ieee1394: ohci1394: don't leave interrupts enabled during suspend/resume
On my HP 2510p I get the following in dmesg during near the end of most
resumes from suspend to RAM:

irq 19: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.28-rc7 #67
Call Trace:
 <IRQ>  [<ffffffffa00ee9e1>] ? ohci_irq_handler+0x60/0x7e9 [ohci1394]
 [<ffffffff8026aa4d>] __report_bad_irq+0x38/0x87
 [<ffffffff8026abaa>] note_interrupt+0x10e/0x174
 [<ffffffff8026b262>] handle_fasteoi_irq+0xa7/0xd1
 [<ffffffff8020eb87>] do_IRQ+0x73/0xe4
 [<ffffffff8020c626>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffffa0012606>] ? acpi_idle_enter_bm+0x26b/0x2b2 [processor]
 [<ffffffffa00125fc>] ? acpi_idle_enter_bm+0x261/0x2b2 [processor]
 [<ffffffff8024f30f>] ? notifier_call_chain+0x33/0x5b
 [<ffffffff803b9c64>] ? cpuidle_idle_call+0x8c/0xc4
 [<ffffffff8020b312>] ? cpu_idle+0x4a/0x9a
 [<ffffffff8042c5c8>] ? rest_init+0x5c/0x5e
handlers:
[<ffffffffa00ee981>] (ohci_irq_handler+0x0/0x7e9 [ohci1394])
Disabling IRQ #19

There also seems to be an interrupt storm during suspend/resume when this
happens:
 19:      99968         33   IO-APIC-fasteoi   ohci1394

This patch gets rid of both issues and makes the resume as a whole
significantly faster.

Signed-off-by: Frans Pop <elendil@planet.nl>

As was pointed out in http://lkml.org/lkml/2008/12/6/127, this does not
fix the cause of the interrupt storm.  However, since the source of the
interrupts could not be determined yet, we make the system at least more
usable with this change.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:33 +01:00
Stefan Richter
b17a550960 ieee1394: mark all hpsb_address_ops instances as const
These are never modified.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:32 +01:00
Stefan Richter
adb0a61681 ieee1394: replace a GFP_ATOMIC by GFP_KERNEL allocation
All callers of hpsb_register_addrspace() can sleep.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04 23:50:32 +01:00
Heiko Carstens
9e01892c42 module: convert to stop_machine_create/destroy.
The module code relies on a non-failing stop_machine call. So we create
the kstop threads in advance and with that make sure the call won't fail.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:15 +10:30
Heiko Carstens
9ea09af3bd stop_machine: introduce stop_machine_create/destroy.
Introduce stop_machine_create/destroy. With this interface subsystems
that need a non-failing stop_machine environment can create the
stop_machine machine threads before actually calling stop_machine.
When the threads aren't needed anymore they can be killed with
stop_machine_destroy again.

When stop_machine gets called and the threads aren't present they
will be created and destroyed automatically. This restores the old
behaviour of stop_machine.

This patch also converts cpu hotplug to the new interface since it
is special: cpu_down calls __stop_machine instead of stop_machine.
However the kstop threads will only be created when stop_machine
gets called.

Changing the code so that the threads would be created automatically
on __stop_machine is currently not possible: when __stop_machine gets
called we hold cpu_add_remove_lock, which is the same lock that
create_rt_workqueue would take. So the workqueue needs to be created
before the cpu hotplug code locks cpu_add_remove_lock.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:14 +10:30
Helge Deller
c298be7449 parisc: fix module loading failure of large kernel modules
On 32bit (and sometimes 64bit) and with big kernel modules like xfs or
ipv6 the relocation types R_PARISC_PCREL17F and R_PARISC_PCREL22F may
fail to reach their PLT stub if we only create one big stub array for
all sections at the beginning of the core or init section.

With this patch we now instead add individual PLT stub entries
directly in front of the code sections where the stubs are actually
called. This reduces the distance between the PCREL location and the
stub entry so that the relocations can be fulfilled.

While calculating the final layout of the kernel module in memory, the
kernel module loader calls arch_mod_section_prepend() to request the
to be reserved amount of memory in front of each individual section.

Tested with 32- and 64bit kernels.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:14 +10:30
Helge Deller
088af9a6e0 module: fix module loading failure of large kernel modules for parisc
When creating the final layout of a kernel module in memory, allow the
module loader to reserve some additional memory in front of a given section.
This is currently only needed for the parisc port which needs to put the
stub entries there to fulfill the 17/22bit PCREL relocations with large
kernel modules like xfs.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (renamed fn)
2009-01-05 08:40:13 +10:30
Jianjun Kong
d1e99d7ae4 module: fix warning of unused function when !CONFIG_PROC_FS
Fix this warning:
kernel/module.c:824: warning: ‘print_unload_info’ defined but not used
print_unload_info() just was used when CONFIG_PROC_FS was defined.
This patch mark print_unload_info() inline to solve the problem.

Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Ingo Molnar <mingo@elte.hu>
CC: Américo Wang <xiyou.wangcong@gmail.com>
2009-01-05 08:40:11 +10:30
Tim Abbott
ca4787b779 kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
When there are two symbols in a module with the same name, one of which is
exported, both will be marked as exported in /proc/kallsyms.  There aren't
any instances of this in the current kernel, but it is easy to construct a
simple module with two compilation units that exhibits the problem.

$ objdump -j .text -t testmod.ko | grep foo
00000000 l     F .text	00000032 foo
00000080 g     F .text	00000001 foo
$ sudo insmod testmod.ko
$ grep "T foo" /proc/kallsyms
c28e8000 T foo	[testmod]
c28e8080 T foo	[testmod]

Fix this by comparing the symbol values once we've found the exported
symbol table entry matching the symbol name.  Tested using Ksplice:

$ ksplice-create --patch=this_commit.patch --id=bar .
$ sudo ksplice-apply ksplice-bar.tar.gz
Done!
$ grep "T foo" /proc/kallsyms
c28e8080 T foo	[testmod]

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:11 +10:30
Johannes Berg
a327ca2c26 remove CONFIG_KMOD
Now that nothing depends on it any more, remove CONFIG_KMOD.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:10 +10:30
James Morris
5c8c40be4b Merge branch 'master' of git://git.infradead.org/users/pcmoore/lblnet-2.6_next into next 2009-01-05 08:56:01 +11:00
Alessandro Zummo
099e657625 rtc: add alarm/update irq interfaces
Add standard interfaces for alarm/update irqs enabling.  Drivers are no
more required to implement equivalent ioctl code as rtc-dev will provide
it.

UIE emulation should now be handled correctly and will work even for those
RTC drivers who cannot be configured to do both UIE and AIE.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Nick Piggin
54566b2c15 fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Bruno Prémont
e687d691cb viafb: fix crashes due to 4k stack overflow
The function viafb_cursor() uses 2 stack-variables of CURSOR_SIZE bits;
CURSOR_SIZE is defined as (8 * 1024).  Using up twice 1k on stack is too
much for 4k-stack (though it works with 8k-stacks).  Make those two
variables kzalloc'ed to preserve stack space.

Also merge the whole lot of local struct's in viafb_ioctl into a union so
the stack usage gets minimized here as well.  (struct's are only accessed
in their indicidual IOCTL case) This second part is only compile-tested as
I know of no userspace app using the IOCTLs.

Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: <JosephChan@via.com.tw>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Pekka Enberg
c644f0e4b5 fs: introduce bgl_lock_ptr()
As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
<linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
filesystem specific header files to break the hidden dependency to
struct ext[234]_sb_info.

Also, while at it, convert the macros to static inlines to try make up
for all the times I broke Andrew Morton's tree.

Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Randy Dunlap
0a30c5cefa spi.h uses/needs device.h
Include header files as used/needed:

  In file included from drivers/leds/leds-dac124s085.c:16:
  include/linux/spi/spi.h:66: error: field 'dev' has incomplete type
  include/linux/spi/spi.h: In function 'to_spi_device':
  include/linux/spi/spi.h💯 warning: type defaults to 'int' in declaration of '__mptr'
  ...

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Adam Lackorzynski
2e4e27c7d0 vmalloc.c: fix flushing in vmap_page_range()
The flush_cache_vmap in vmap_page_range() is called with the end of the
range twice.  The following patch fixes this for me.

Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Li Zefan
7b574b7b01 cgroups: fix a race between cgroup_clone and umount
The race is calling cgroup_clone() while umounting the ns cgroup subsys,
and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
called after cgroup_clone() created a new dir in it.

The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);

  ------------[ cut here ]------------
  kernel BUG at kernel/cgroup.c:1093!
  invalid opcode: 0000 [#1] SMP
  ...
  Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
  ...
  Call Trace:
   [<c0493df7>] ? deactivate_super+0x3f/0x51
   [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
   [<c04a3ab2>] ? sys_umount+0x265/0x2ac
   [<c04a3b06>] ? sys_oldumount+0xd/0xf
   [<c0403911>] ? sysenter_do_call+0x12/0x31
  ...
  EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
  ---[ end trace c766c1be3bf944ac ]---

Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:19 -08:00
Al Viro
5af75d8d58 audit: validate comparison operations, store them in sane form
Don't store the field->op in the messy (and very inconvenient for e.g.
audit_comparator()) form; translate to dense set of values and do full
validation of userland-submitted value while we are at it.

->audit_init_rule() and ->audit_match_rule() get new values now; in-tree
instances updated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:42 -05:00
Al Viro
36c4f1b18c clean up audit_rule_{add,del} a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:42 -05:00
Al Viro
e048e02c89 make sure that filterkey of task,always rules is reported
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:42 -05:00