Commit graph

28,545 commits

Author SHA1 Message Date
Grant Likely
d1c3414c2a drivercore: Add driver probe deferral mechanism
Allow drivers to report at probe time that they cannot get all the resources
required by the device, and should be retried at a later time.

This should completely solve the problem of getting devices
initialized in the right order.  Right now this is mostly handled by
mucking about with initcall ordering which is a complete hack, and
doesn't even remotely handle the case where device drivers are in
modules.  This approach completely sidesteps the issues by allowing
driver registration to occur in any order, and any driver can request
to be retried after a few more other drivers get probed.

v4: - Integrate Manjunath's addition of a separate workqueue
    - Change -EAGAIN to -EPROBE_DEFER for drivers to trigger deferral
    - Update comment blocks to reflect how the code really works
v3: - Hold off workqueue scheduling until late_initcall so that the bulk
      of driver probes are complete before we start retrying deferred devices.
    - Tested with simple use cases.  Still needs more testing though.
      Using it to get rid of the gpio early_initcall madness, or to replace
      the ASoC internal probe deferral code would be ideal.
v2: - added locking so it should no longer be utterly broken in that regard
    - remove device from deferred list at device_del time.
    - Still completely untested with any real use case, but has been
      boot tested.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dilan Lee <dilee@nvidia.com>
Cc: Manjunath GKondaiah <manjunath.gkondaiah@linaro.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: David Daney <david.daney@cavium.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:53:13 -08:00
Jiri Slaby
c5f0508b99 TTY: amiserial, remove tasklet for tty_wakeup
tty_wakeup is safe to be called from all contexts. No need to schedule
a tasklet for that. Let's call it directly like in other drivers.

This allows us to kill another member of async_struct structure. (If
we remove the dummy uses in simserial.)

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:48:42 -08:00
Jiri Slaby
9c8efecc91 TTY: serialP, remove unused material
First, remove unused macro and rs_multiport_struct structure. Nobody
uses them at all.

Further, the 2 drivers (they are below) which use the rest of
structures from serialP.h (async_struct and serial_state) do not use
all the members. Remove the members:
* which are unused or
* which are only initialized and never used for something real.

Everybody should avoid the structures with a looong distance.

Finally, remove the ALPHA kludge MCR quirks. They are 1:1 copy from
8250.h. No need to redefine them here.

The 2 promised users of the structures:
arch/ia64/hp/sim/simserial.c
drivers/tty/amiserial.c

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:48:42 -08:00
Jiri Slaby
44a1bfd95d TTY: serialP, remove DECLARE_WAITQUEUE check
The macro is always defined now. This was there only for historical
reasons.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:42:21 -08:00
Jiri Slaby
26b23209c0 TTY: tty_driver, document tty->ops->shutdown limitation
Note that tty->ops->shutdown is called from whatever context the user
drops the last tty reference from. E.g. if one takes a reference in
an ISR, tty close happens on other CPU and the final tty put is from
the ISR, tty->ops->shutdown will be called from that hard irq context.

We would have a problem in vt if we start using tty refcounting from
other contexts than user there. It is because vt's shutdown uses
mutexes. This is yet to be fixed.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:38:51 -08:00
Jiri Slaby
91cedcde1e TTY: serial, simplify ASYNC_USR_MASK
By using ASYNC_SPD_MASK instead of the single speed bits.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:38:50 -08:00
Jiri Slaby
87cab16beb TTY: remove minor_num from tty_driver
It was added back in 2004 and never used for anything real. Remove the
only assignment in the tree as well.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:30:21 -08:00
Jiri Slaby
1a54a76d51 TTY: let alloc_tty_driver deduce the owner automatically
Like the rest of the kernel, make a stub from alloc_tty_driver which
calls __alloc_tty_driver with proper owner. This will save us one more
assignment on the driver side.

Also this fixes some drivers which didn't set the owner. This allowed
user to remove the module from the system even though a tty from the
driver is still open.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 11:30:21 -08:00
Alan Cox
079c9534a9 vt:tackle kbd_table
Keyboard struct lifetime is easy, but the locking is not and is completely
ignored by the existing code. Tackle this one head on

- Make the kbd_table private so we can run down all direct users
- Hoick the relevant ioctl handlers into the keyboard layer
- Lock them with the keyboard lock so they don't change mid keypress
- Add helpers for things like console stop/start so we isolate the poking
  around properly
- Tweak the braille console so it still builds

There are a couple of FIXME locking cases left for ioctls that are so hideous
they should be addressed in a later patch. After this patch the kbd_table is
private and all the keyboard jiggery pokery is in one place.

This update fixes speakup and also a memory leak in the original.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 10:50:35 -08:00
Linus Walleij
ffbbdd2132 spi: create a message queueing infrastructure
This rips the message queue in the PL022 driver out and pushes
it into (optional) common infrastructure. Drivers that want to
use the message pumping thread will need to define the new
per-messags transfer methods and leave the deprecated transfer()
method as NULL.

Most of the design is described in the documentation changes that
are included in this patch.

Since there is a queue that need to be stopped when the system
is suspending/resuming, two new calls are implemented for the
device drivers to call in their suspend()/resume() functions:
spi_master_suspend() and spi_master_resume().

ChangeLog v1->v2:
- Remove Kconfig entry and do not make the queue support optional
  at all, instead be more agressive and have it as part of the
  compulsory infrastructure.
- If the .transfer() method is implemented, delete print a small
  deprecation notice and do not start the transfer pump.
- Fix a bitrotted comment.
ChangeLog v2->v3:
- Fix up a problematic sequence courtesy of Chris Blair.
- Stop rather than destroy the queue on suspend() courtesy of
  Chris Blair.

Signed-off-by: Chris Blair <chris.blair@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-03-07 19:19:48 -07:00
Pablo Neira Ayuso
24de58f465 netfilter: xt_CT: allow to attach timeout policy + glue code
This patch allows you to attach the timeout policy via the
CT target, it adds a new revision of the target to ensure
backward compatibility. Moreover, it also contains the glue
code to stick the timeout object defined via nfnetlink_cttimeout
to the given flow.

Example usage (it requires installing the nfct tool and
libnetfilter_cttimeout):

1) create the timeout policy:

 nfct timeout add tcp-policy0 inet tcp \
	established 1000 close 10 time_wait 10 last_ack 10

2) attach the timeout policy to the packet:

 iptables -I PREROUTING -t raw -p tcp -j CT --timeout tcp-policy0

You have to install the following user-space software:

a) libnetfilter_cttimeout:
   git://git.netfilter.org/libnetfilter_cttimeout

b) nfct:
   git://git.netfilter.org/nfct

You also have to get iptables with -j CT --timeout support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:28 +01:00
Pablo Neira Ayuso
5097846230 netfilter: add cttimeout infrastructure for fine timeout tuning
This patch adds the infrastructure to add fine timeout tuning
over nfnetlink. Now you can use the NFNL_SUBSYS_CTNETLINK_TIMEOUT
subsystem to create/delete/dump timeout objects that contain some
specific timeout policy for one flow.

The follow up patches will allow you attach timeout policy object
to conntrack via the CT target and the conntrack extension
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:22 +01:00
Pablo Neira Ayuso
33ee44643f netfilter: nf_ct_tcp: move retransmission and unacknowledged timeout to array
This patch moves the retransmission and unacknowledged timeouts
to the tcp_timeouts array. This change is required by follow-up
patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:15 +01:00
WANG Cong
5f1f815103 netfilter: remove ipt_SAME.h and ipt_realm.h
These two headers are not required anymore, they have been
replaced by xt_SAME.h and xt_realm.h.

Florian Westphal pointed out this.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:56 +01:00
Richard Weinberger
6939c33a75 netfilter: merge ipt_LOG and ip6_LOG into xt_LOG
ipt_LOG and ip6_LOG have a lot of common code, merge them
to reduce duplicate code.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:49 +01:00
Pablo Neira Ayuso
544d5c7d9f netfilter: ctnetlink: allow to set expectfn for expectations
This patch allows you to set expectfn which is specifically used
by the NAT side of most of the existing conntrack helpers.

I have added a symbol map that uses a string as key to look up for
the function that is attached to the expectation object. This is
the best solution I came out with to solve this issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:46 +01:00
Pablo Neira Ayuso
076a0ca026 netfilter: ctnetlink: add NAT support for expectations
This patch adds the missing bits to create expectations that
are created in NAT setups.
2012-03-07 17:40:44 +01:00
Pablo Neira Ayuso
b8c5e52c13 netfilter: ctnetlink: allow to set expectation class
This patch allows you to set the expectation class.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:42 +01:00
Jozsef Kadlecsik
7f81c951d9 netfilter: ipset: hash:net,iface timeout bug fixed
Timed out entries were still matched till the garbage collector
purged them out. The fix is verified in the testsuite.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:37 +01:00
Jozsef Kadlecsik
2a7cef2a4b netfilter: ipset: Exceptions support added to hash:*net* types
The "nomatch" keyword and option is added to the hash:*net* types,
by which one can add exception entries to sets. Example:

        ipset create test hash:net
        ipset add test 192.168.0/24
        ipset add test 192.168.0/30 nomatch

In this case the IP addresses from 192.168.0/24 except 192.168.0/30
match the elements of the set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:35 +01:00
Jozsef Kadlecsik
0927a1ac63 netfilter: ipset: Log warning when a hash type of set gets full
If the set is full, the SET target cannot add more elements.
Log warning so that the admin got notified about it.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:33 +01:00
Jan Engelhardt
ae8ded1cb8 netfilter: ipset: expose userspace-relevant parts in ip_set.h
iptables's libxt_SET.c depends on these.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:31 +01:00
Jan Engelhardt
c15f1c8325 netfilter: ipset: use NFPROTO_ constants
ipset is actually using NFPROTO values rather than AF (xt_set passes
that along).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:29 +01:00
Linus Torvalds
4f262acfde Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King.

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7358/1: perf: add PMU hotplug notifier
  ARM: 7357/1: perf: fix overflow handling for xscale2 PMUs
  ARM: 7356/1: perf: check that we have an event in the PMU IRQ handlers
  ARM: 7355/1: perf: clear overflow flag when disabling counter on ARMv7 PMU
  ARM: 7354/1: perf: limit sample_period to half max_period in non-sampling mode
  ARM: ecard: ensure fake vma vm_flags is setup
  ARM: 7346/1: errata: fix PL310 erratum #753970 workaround selection
  ARM: 7345/1: errata: update workaround for A9 erratum #743622
  ARM: 7348/1: arm/spear600: fix one-shot timer
  ARM: 7339/1: amba/serial.h: Include types.h for resolving dependency of type bool
2012-03-07 08:33:03 -08:00
Weston Andros Adamson
d8e0539ebd NFS: add filehandle crc for debug display
Match wireshark's CRC-32 hash for easier debugging

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-07 10:34:07 -05:00
Andrew Morton
95468fd41a writeback: fix typo in the writeback_control comment
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-07 16:08:46 +01:00
Anton Blanchard
b1ada6010e atomic: Allow atomic_inc_not_zero to be overridden
We want to implement a ppc64 specific version of atomic_inc_not_zero
so wrap it in an ifdef to allow it to be overridden.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:08 +11:00
Or Gerlitz
8154c07fe1 mlx4_core: Get rid of redundant ext_port_cap flags
While doing the work for commit a6f7feae6d ("IB/mlx4: pass SMP
vendor-specific attribute MADs to firmware") we realized that the
firmware would respond on all sorts of vendor-specific MADs.
Therefore commit 97285b7817 ("mlx4_core: Add extended port
capabilities support") adds redundant code into the driver, since
there's no real reaon to maintain the extended capabilities of the
port, as they can be queried on demand (e.g the FDR10 capability).

This patch reverts commit 97285b7817 and removes the check for
extended caps from the mlx4_ib driver port query flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-06 17:25:18 -08:00
Stephen Warren
0acfb076f7 pinctrl: forward-declare struct device
Add a dummy declaration of struct device to avoid the following warning:

In file included from include/linux/pinctrl/machine.h:15:0,
                 from arch/arm/mach-tegra/board-pinmux.h:18,
                 from arch/arm/mach-tegra/board-trimslice-pinmux.c:20:
include/linux/pinctrl/pinctrl.h:115:12: warning: 'struct device' declared inside parameter list [enabled by default]
include/linux/pinctrl/pinctrl.h:115:12: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]

Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-03-06 23:29:05 +01:00
Linus Walleij
9a01be1715 pinctrl: split pincontrol states into its own header
Move the pin control state defines into its own header file,
since it is used both by machine.h which is facing the platform
and by consumer.h which is facing the drivers, and pinctrl.h
which is pinctrl-driver internal, let's not have each and every
.h file include all others, then isolation is moot.

Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-03-06 23:08:19 +01:00
David S. Miller
95f050bf7f net: Use bool for return value of dev_valid_name().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 16:12:15 -05:00
Yevgeny Petrilin
9a9a232a92 net/mlx4: fixing sparse warnings for not declared, functions
The SET_PORT functions are implemented in port.c, which is part
of mlx4_core, these functions are exported. The functions are in use by
the mlx4_en module (were originally part of mlx4_en).
Their declaration remained in mlx4_en module, moving the declaration to the right location.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Samuel Ortiz
c970a1ac4e NFC: Add device powered netlink attribute
For user space to know if a device is up or down.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-06 15:16:19 -05:00
Arend van Spriel
10d8493cd9 bcma: add support for on-chip OTP memory used for SPROM storage
Wireless Broadcom chips can have either their SPROM data stored
on either external SPROM or on-chip OTP memory. Both are accessed
through the same register space. This patch adds support for the
on-chip OTP memory.

Tested with:
BCM43224 OTP and SPROM
BCM4331 SPROM
BCM4313 OTP

This patch is in response to linux-wireless thread [1].

[1] http://article.gmane.org/gmane.linux.kernel.wireless.general/85426

Tested-by: Saul St. John <saul.stjohn@gmail.com>
Tested-by: Rafal Milecki <zajec5@gmail.com>
Tested-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-06 15:16:18 -05:00
Johannes Berg
804483e907 cfg80211/mac80211: report signal strength for mgmt frames
Add the signal strength (in dBm only for now) to
frames that are received via nl80211's various
frame APIs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-06 15:16:05 -05:00
Trond Myklebust
2d2f24add1 NFSv4: Simplify the struct nfs4_stateid
Replace the union with the common struct stateid4 as defined in both
RFC3530 and RFC5661. This makes it easier to access the sequence id,
which will again make implementing support for parallel OPEN calls
easier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:47 -05:00
Tomi Valkeinen
0ba86d7ede Merge commit 'v3.3-rc6'
Merge v3.3-rc6 to get the latest DSS and OMAP arch fixes.

Conflicts:
	arch/arm/mach-omap1/board-innovator.c
	drivers/video/omap2/dss/apply.c
2012-03-06 13:20:31 +02:00
Florian Tobias Schandinat
d282e4d935 Merge commit 'v3.3-rc6' into fbdev-next 2012-03-06 08:09:16 +00:00
Alan Cox
33e9970add x86/mid: Kill off Moorestown
All production devices operate in the Oaktrail configuration
with legacy PC elements present and an ACPI BIOS.  Continue
stripping out the Moorestown elements from the tree leaving
Medfield.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-z0dxy88f949rvxo5vvd08ybs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-06 09:01:24 +01:00
David S. Miller
f6a1ad4295 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vmxnet3/vmxnet3_drv.c

Small vmxnet3 conflict with header size bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05 21:16:26 -05:00
Linus Torvalds
3e85fb9cd4 Merge branch 'akpm' (Andrew's patch bomb)
Merge the emailed seties of 19 patches from Andrew Morton

* akpm:
  rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
  memcg: fix mapcount check in move charge code for anonymous page
  mm: thp: fix BUG on mm->nr_ptes
  alpha: fix 32/64-bit bug in futex support
  memcg: fix GPF when cgroup removal races with last exit
  debugobjects: Fix selftest for static warnings
  floppy/scsi: fix setting of BIO flags
  memcg: fix deadlock by inverting lrucare nesting
  drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
  c2port: class_create() returns an ERR_PTR
  pps: class_create() returns an ERR_PTR, not NULL
  hung_task: fix the broken rcu_lock_break() logic
  vfork: kill PF_STARTING
  coredump_wait: don't call complete_vfork_done()
  vfork: make it killable
  vfork: introduce complete_vfork_done()
  aio: wake up waiters when freeing unused kiocbs
  kprobes: return proper error code from register_kprobe()
  kmsg_dump: don't run on non-error paths by default
2012-03-05 15:50:25 -08:00
Hugh Dickins
7512102cf6 memcg: fix GPF when cgroup removal races with last exit
When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).

Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.

A task, or a charge, or a page on lru: each secures a memcg against
removal.  In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.

Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons.  So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.

There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed.  We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().

I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f3e ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.

But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
change?  just going back to how it worked before; testing, and an audit of
uses of pc->mem_cgroup, show no problem.

And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2b9 ("memcg:
clear pc->mem_cgroup if necessary").

Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:43 -08:00
Oleg Nesterov
6e27f63edb vfork: kill PF_STARTING
Previously it was (ab)used by utrace.  Then it was wrongly used by the
scheduler code.

Currently it is not used, kill it before it finds the new erroneous user.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
57b59c4a14 coredump_wait: don't call complete_vfork_done()
Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
d68b46fe16 vfork: make it killable
Make vfork() killable.

Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
fails we do not return to the user-mode and never touch the memory shared
with our child.

However, in this case we should clear child->vfork_done before return, we
use task_lock() in do_fork()->wait_for_vfork_done() and
complete_vfork_done() to serialize with each other.

Note: now that we use task_lock() we don't really need completion, we
could turn task->vfork_done into "task_struct *wake_up_me" but this needs
some complications.

NOTE: this and the next patches do not affect in-kernel users of
CLONE_VFORK, kernel threads run with all signals ignored including
SIGKILL/SIGSTOP.

However this is obviously the user-visible change.  Not only a fatal
signal can kill the vforking parent, a sub-thread can do execve or
exit_group() and kill the thread sleeping in vfork().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
c415c3b47e vfork: introduce complete_vfork_done()
No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Matthew Garrett
c22ab33290 kmsg_dump: don't run on non-error paths by default
Since commit 04c6862c05 ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.

This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace.  In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.

This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace).  Without it, the code will only be run on
failure paths rather than on normal paths.  The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Linus Torvalds
aa139092de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

1) TCP SACK processing can calculate an incorrect reordering value in
   some cases, fix from Neal Cardwell.

2) tcp_mark_head_lost() can split SKBs in situations where it should
   not, violating send queue invariants expected by other pieces of
   code and thus resulting (eventually) in corrupted retransmit state
   counters.  Also from Neal Cardwell.

3) qla3xxx erroneously calls spin_lock_irqrestore() with constant
   hw_flags of zero.  Fix from Santosh Nayak.

4) Fix NULL deref in rt2x00, from Gabor Juhos.

5) pch_gbe passes address of wrong typed object to pch_gbe_validate_option
   thus corrupting part of the value.  From Dan Carpenter.

6) We must check the return value of nlmsg_parse() before trying to use
   the results.  From Eric Dumazet.

7) Bridging code fails to check return value of ipv6_dev_get_saddr()
   thus potentially leaving uninitialized garbage in the outgoing ipv6
   header.  From Ulrich Weber.

8) Due to rounding and a reversed operation on jiffies, bridge message
   ages can go backwards instead of forwards, thus breaking STP.  Fixes
   from Joakim Tjernlund.

9) r8169 modifies Config* registers without properly holding the
   Config9346 lock, resulting in corrupted IP fragments on some chips.
   Fix from Francois Romieu.

10) NET_PACKET_ENGINE default wan't set properly during the network
   driver mega-move.  Fix from Stephen Hemminger.

11) vmxnet3 uses TCP header size where it actually should use the UDP
   header size, fix from Shreyas Bhatewara.

12) Netfilter bridge module autoload is busted in the compat case, fix
   from Florian Westphal.

13) Wireless Key removal was not setting multicast bits correctly thus
   accidently killing the unicast key 0 and thus all traffic stops.
   Fix from Johannes Berg.

14) Fix endless retries of A-MPDU transmissions in brcm80211 driver.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
  qla3xxx: ethernet: Fix bogus interrupt state flag.
  bridge: check return value of ipv6_dev_get_saddr()
  rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
  bridge: message age needs to increase, not decrease.
  bridge: Adjust min age inc for HZ > 256
  tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  r8169: corrupted IP fragments fix for large mtu.
  packetengines: fix config default
  vmxnet3: Fix transport header size
  enic: fix an endian bug in enic_probe()
  pch_gbe: memory corruption calling pch_gbe_validate_option()
  tg3: Fix tg3_get_stats64 for 5700 / 5701 devs
  tcp: fix false reordering signal in tcp_shifted_skb
  tcp: fix comment for tp->highest_sack
  netfilter: bridge: fix module autoload in compat case
  brcm80211: smac: only print block-ack timeout message at trace level
  brcm80211: smac: fix endless retry of A-MPDU transmissions
  mac80211: Fix a warning on changing to monitor mode from STA
  mac80211: zero initialize count field in ieee80211_tx_rate
  iwlwifi: fix key removal
  ...
2012-03-05 14:30:54 -08:00
Linus Torvalds
789ce9b9c2 Merge branch 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull per-cpu patches from Tejun Heo:
 "This pull request contains four patches.  One replaces manual clearing
  with bitmap_clear(), two fix generic definition of __this_cpu ops so
  that they don't choose unnecessarily strict arch version.  One makes
  _this_cpu definition use raw_local_irq_*() so that it doesn't end up
  wrecking irq on/off state tracking when used from inside lockdep.

  Of the four patches, the raw_local_irq_*() update is the most
  important, so please feel free to cherry pick only that one patch and
  ignore the rest if you want to - commit e920d5971d 'percpu: use
  raw_local_irq_* in _this_cpu op'."

* 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: fix __this_cpu_{sub,inc,dec}_return() definition
  percpu: use raw_local_irq_* in _this_cpu op
  percpu: fix generic definition of __this_cpu_add_and_return()
  percpu: use bitmap_clear
2012-03-05 14:28:36 -08:00
David S. Miller
7c1e51a34a Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-03-05 16:39:02 -05:00