Commit graph

17,955 commits

Author SHA1 Message Date
Chia-chi Yeh
c30cd45aad security: Add AID_NET_RAW and AID_NET_ADMIN capability check in cap_capable().
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2010-02-08 15:35:52 -08:00
Chia-chi Yeh
286fcd1c1e net: PPPoPNS and PPPoLAC fixes.
net: Fix a bitmask in PPPoPNS and rename constants in PPPoPNS and PPPoLAC.

Signed-off-by: Chia-chi Yeh <chiachi@android.com>

net: Fix a potential deadlock while releasing PPPoLAC/PPPoPNS socket.

PPP driver guarantees that no thread will be executing start_xmit() after
returning from ppp_unregister_channel(). To achieve this, a spinlock (downl)
is used. In pppolac_release(), ppp_unregister_channel() is called after sk_udp
is locked. At the same time, another thread might be running in pppolac_xmit()
with downl. Thus a deadlock will occur if the thread tries to lock sk_udp.
The same situation might happen on sk_raw in pppopns_release().

Signed-off-by: Chia-chi Yeh <chiachi@android.com>

net: Force PPPoLAC and PPPoPNS to bind an interface before creating PPP channel.

It is common to manipulate the routing table after configuring PPP device.
Since both PPPoLAC and PPPoPNS run over IP, care must be taken to make sure
that there is no loop in the routing table.
Although this can be done by adding a host route, it might still cause
problems when the interface is down for some reason.

To solve this, this patch forces both drivers to bind an interface before
creating PPP channel, so the system will not re-route the tunneling sockets
to another interface when the original one is down. Another benefit is that
now the host route is no longer required, so there is no need to remove it
when PPP channel is closed.

Signed-off-by: Chia-chi Yeh <chiachi@android.com>

net: Avoid sleep-inside-spinlock in PPPoLAC and PPPoPNS.

Since recv() and xmit() are called with a spinlock held, routines which might
sleep cannot be used. This issue is solved by following changes:

Incoming packets are now processed in backlog handler, recv_core(), instead of
recv(). Since backlog handler is always executed with socket spinlock held, the
requirement of ppp_input() is still satisfied.

Outgoing packets are now processed in workqueue handler, xmit_core(), instead of
xmit(). Note that kernel_sendmsg() is no longer used to prevent touching dead
sockets.

In release(), lock_sock() and pppox_unbind_sock() ensure that no thread is in
recv_core() or xmit(). Then socket handlers are restored before release_sock(),
so no packets will leak in backlog queue.

Signed-off-by: Chia-chi Yeh <chiachi@android.com>

net: Fix msg_iovlen in PPPoLAC and PPPoPNS.

Although any positive value should work (which is always true in both drivers),
the correct value should be 1.

Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2010-02-08 15:34:48 -08:00
Chia-chi Yeh
7cc7c63e62 net: add PPP on PPTP Network Server (PPPoPNS) driver.
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2010-02-08 15:09:14 -08:00
Chia-chi Yeh
dbdb1c3278 net: add PPP on L2TP Access Concentrator (PPPoLAC) driver.
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2010-02-08 15:09:14 -08:00
Dmitry Shmidt
17a60bbffc tiwlan: Add abstract wifi control functions support 2010-02-08 15:09:05 -08:00
Krishna, Vamsi
dd5acba95d USB: gadget: android: android USB gadget improvements:
USB: android gadget: add remote wakeup attribute to android function

Add remote wakeup attribute to configuration descriptor of android
function to advertise remote wakeup capability to host

Acked-by: Allam, Suresh Reddy <sallam@qualcomm.com>

Signed-off-by: Mike Lockwood <lockwood@android.com>

USB: gadget: android: Allow functions to handle setup requests.

Signed-off-by: Mike Lockwood <lockwood@android.com>

Support for specifying the list of USB functions from platform data.

The main android.c gadget driver no longer has hard coded references
to the mass_storage and adb functions.

Support for computing the product ID based on tables in platform data
and the currently enabled functions.

Moved the adb enable/disable logic from android.c to f_adb.c.

Change-Id: I6259d3fb1473ed973f700e55d17744956f3527bb
Signed-off-by: Mike Lockwood <lockwood@android.com>
2010-02-08 15:09:03 -08:00
Mike Lockwood
ee26eaf139 USB: composite: Add flag to usb_function to hide its interface during enumeration
Change-Id: Ie999b5190e3e2b6fd23015b8e796cdd178829929

Signed-off-by: Mike Lockwood <lockwood@android.com>
2010-02-08 15:09:02 -08:00
Mike Lockwood
2c7c4829ac android_usb: Composite USB gadget driver for android.
Signed-off-by: Mike Lockwood <lockwood@android.com>

USB: android gadget: add remote wakeup attribute to android function

Add remote wakeup attribute to configuration descriptor of android
function to advertise remote wakeup capability to host

Acked-by: Allam, Suresh Reddy <sallam@qualcomm.com>

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb gadget: link fixes for android composite gadget

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb gadget: Fix null pointer errors in android composite driver

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb: gadget: android: Allow usb charging to draw up to 500mA instead of 250.

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>

usb gadget: android: Add helper function for usb_gadget_connect and disconnect.

Signed-off-by: Mike Lockwood <lockwood@android.com>

drivers: usb: gadget: call switch_dev_unregister in mass storage unbind callback

This fixes a problem unloading the android gadget driver when built as a module

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb: gadget: android: Add dependency on switch driver.

Signed-off-by: Mike Lockwood <lockwood@android.com>

USB: gadget: android: Fix USB WHQL Certification Issues

Submitted on behalf of RaviKumar Vembu <ravi.v@motorola.com>
Signed-off-by: Jared Suttles <jared.suttles@motorola.com>
Signed-off-by: Mike Lockwood <lockwood@android.com>

drivers: usb: gadget: Add "usb_mass_storage" platform driver.

This will be used for configuring vendor, product and release from board file.

Submitted on behalf of RaviKumar Vembu <ravi.v@motorola.com>
Signed-off-by: Jared Suttles <jared.suttles@motorola.com>
Signed-off-by: Mike Lockwood <lockwood@android.com>

drivers: usb: gadget: Use usb_mass_storage platform device as parent for lun

If a platform device is specified for the f_mass_storage function, use it as the
parent driver for the lun files in sysfs.
This allows a platform independent file path for controlling USB mass storage
from user space.

Signed-off-by: Mike Lockwood <lockwood@android.com>

drivers: usb: gadget: Add platform data struct for usb_mass_storage device

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb: gadget: mass_storage: Fix Mass Storage Panic during PC reboot

Submitted on behalf of RaviKumar Vembu <ravi.v@motorola.com>
Signed-off-by: Jared Suttles <jared.suttles@motorola.com>
Signed-off-by: Mike Lockwood <lockwood@android.com>

usb: gadget: f_mass_storage: Handle setup request correctly

Signed-off-by: Mike Lockwood <lockwood@android.com>

usb: gadget: f_mass_storage: Clean up wakelocks on error paths

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>
Signed-off-by: Mike Lockwood <lockwood@android.com>
2010-02-08 15:08:50 -08:00
San Mehat
40594a0045 mmc: core: Add deferred bus resume policy.
A card driver can now specify that the underlying bus should *not*
auto-resume with the rest of the system. This is useful for reducing resume
latency as well as saving power when the card driver is not using the
bus. In the future, we'll add support for manual suspend

Signed-off-by: San Mehat <san@google.com>
2010-02-08 15:08:29 -08:00
Dmitry Shmidt
8c542b3a4e trout: Add functions for WiFi 2010-02-08 15:07:54 -08:00
San Mehat
6558addc2a mmc: Add concept of an 'embedded' SDIO device.
This is required to support chips which use SDIO for signaling/
communication but do not implement the various card enumeration registers
as required for full SD / SDIO cards.

mmc: sdio: Fix bug where we're freeing the CIS tables we never allocated when using EMBEDDED_SDIO
mmc: Add max_blksize to embedded SDIO data

Signed-off-by: San Mehat <san@google.com>
2010-02-08 15:07:54 -08:00
San Mehat
cfd429c655 mmc: Add status IRQ and status callback function to mmc platform data
Signed-off-by: San Mehat <san@google.com>
2010-02-08 15:07:53 -08:00
Rebecca Schultz Zavin
252e0e32ac Input: synaptics_i2c_rmi: Add irqflags to platform data struct to pass them to driver
Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-08 15:07:52 -08:00
Arve Hjønnevåg
3929c1894c Input: synaptics_i2c_rmi: Add sensitivity adjust option.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-08 15:07:51 -08:00
Arve Hjønnevåg
cf8b77e6a6 Input: synaptics_i2c_rmi: Driver for Synaptics Touchscreens using RMI over I2C.
Signed-off-by: Arve Hjønnevåg <arve@android.com>

Input: synaptics_i2c_rmi: disable_irq -> disable_irq_nosync

Also remove duplicate swap macro

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-08 15:07:50 -08:00
Mike Lockwood
6cd91b7b3e input: keychord: Add keychord driver
This driver allows userspace to receive notification when client
specified key combinations are pressed.
The client opens /dev/keychord and writes a list of keychords
for the driver to monitor.
The client then reads or polls /dev/keychord for notifications.
A client specified ID for the keychord is returned from read()
when a keychord press is detected.

Signed-off-by: Mike Lockwood <lockwood@android.com>

keychord: fix to build without CONFIG_PREEMPT

Change-Id: I911f13aeda4224b6fa57863bc7e8972fec8837fb
2010-02-08 15:07:49 -08:00
Arve Hjønnevåg
d94329d008 input: Add keyreset driver.
Add a platform device in the board file to specify a reset key-combo.
The first time the key-combo is detected a work function that syncs
the filesystems is scheduled. If all the keys are released and then
pressed again, it calls panic. Reboot on panic should be set for
this to work.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-08 15:07:49 -08:00
Arve Hjønnevåg
f263c5383a Input: gpio_event: Allow multiple input devices per gpio_event device
This is needed to support devices that put non-keyboard buttons in
the keyboard matrix. For instance several devices put the trackball
button in the keyboard matrix. In this case BTN_MOUSE should be
reported from the same input device as REL_X/Y.

It is also useful for devices that have multiple logical keyboard in
the same matrix. The HTC dream has a menu key on the external keyboard
and another menu key on the slide-out keyboard. With a single input
device only one of these menu keys can be mapped to KEY_MENU.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-08 15:07:48 -08:00
Arve Hjønnevåg
cbc1c940b5 Input: Generic GPIO Input device.
Supports keyboard matrixces, direct inputs, direct outputs and axes connected to gpios.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Nick Pelly <npelly@google.com>
2010-02-08 15:07:46 -08:00
Mike Lockwood
e563c281d1 FAT: Add new ioctl VFAT_IOCTL_GET_VOLUME_ID for reading the volume ID.
Signed-off-by: Brian Swetland <swetland@google.com>
2010-02-08 15:07:45 -08:00
Mike Chan
ab6c8fc0de uidstat: Adding uid stat driver to collect network statistics.
Signed-off-by: Mike Chan <mike@android.com>
2010-02-08 15:07:45 -08:00
Robert Love
e97bf10949 net: socket ioctl to reset connections matching local address
Introduce a new socket ioctl, SIOCKILLADDR, that nukes all sockets
bound to the same local address. This is useful in situations with
dynamic IPs, to kill stuck connections.

Signed-off-by: Brian Swetland <swetland@google.com>

net: fix tcp_v4_nuke_addr

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix a spinlock recursion bug in tcp_v4_nuke.

We can't hold the lock while calling to tcp_done(), so we drop
it before calling. We then have to start at the top of the chain again.

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix race in tcp_v4_nuke_addr().

To fix a recursive deadlock in 2.6.29, we stopped holding the hash table lock
across tcp_done() calls. This fixed the deadlock, but introduced a race where
the socket could die or change state.

Fix: Before unlocking the hash table, we grab a reference to the socket. We
can then unlock the hash table without risk of the socket going away. We then
lock the socket, which is safe because it is pinned. We can then call
tcp_done() without recursive deadlock and without race. Upon return, we unlock
the socket and then unpin it, killing it.

Change-Id: Idcdae072b48238b01bdbc8823b60310f1976e045
Signed-off-by: Robert Love <rlove@google.com>
Acked-by: Dima Zavin <dima@android.com>

ipv4: disable bottom halves around call to tcp_done().

Signed-off-by: Robert Love <rlove@google.com>
Signed-off-by: Colin Cross <ccross@android.com>

ipv4: Move sk_error_report inside bh_lock_sock in tcp_v4_nuke_addr

When sk_error_report is called, it wakes up the user-space thread, which then
calls tcp_close.  When the tcp_close is interrupted by the tcp_v4_nuke_addr
ioctl thread running tcp_done, it leaks 392 bytes and triggers a WARN_ON.

This patch moves the call to sk_error_report inside the bh_lock_sock, which
matches the locking used in tcp_v4_err.

Signed-off-by: Colin Cross <ccross@android.com>
2010-02-08 15:07:36 -08:00
Robert Love
97f93ad8e2 ashmem for 2.6.27.
Forward port of ashmem to 2.6.27.

Signed-off-by: Robert Love <rlove@google.com>

ashmem: Don't install fault handler for private mmaps.

Ashmem is used to create named private heaps. If this heap is backed
by a tmpfs file it will allocate two pages for every page touched.
In 2.6.27, the extra page would later be freed, but 2.6.29 does not
scan anonymous pages when running without swap so the memory is not
freed while the file is referenced. This change changes the behavior
of private ashmem mmaps to match /dev/zero instead tmpfs.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

ashmem: Add common prefix to name reported in /proc/pid/maps

Signed-off-by: Arve Hjønnevåg <arve@android.com>

ashmem: don't require a page aligned size

This makes ashmem more similar to shmem and mmap, by
not requiring the specified size to be page aligned,
instead rounding it internally as needed.

Signed-off-by: Marco Nelissen <marcone@android.com>
2010-02-08 15:07:23 -08:00
Robert Love
d69e8af598 Add android_aid.h
Add <linux/android_aid.h>, our mapping of AID defines to gid numbers.

Signed-off-by: Robert Love <rlove@google.com>
2010-02-08 15:07:20 -08:00
Rebecca Schultz
653d96473f pmem: Add pmem driver
Signed-off-by: Rebecca Schultz <rschultz@google.com>

pmem: Use the thread group leader insted of the current thread.

Instead of keeping track of the current thread, use the thread group leader

Signed-off-by: Rebecca Schultz <rschultz@google.com>

pmem: Add some apis to reference and flush pmem files by file struct

The api to refer to pmem files by fd should be depricated, it can
cause problems if a processes fd table changes while the kernel is processing
data in a pmem file.  This change adds the safer api.

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>

pmem: Remove unused depricated fd api to pmem.

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>

pmem: Remove error message when calling get_pmem_addr

This call is used from the mdp driver to determine if the memory
is in pmem or in the fb.  We will encounter this case during normal operation
so this error message should be removed.

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>

pmem: Add include sched.h to fix compile errors

Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>

pmem: remove HW3D_* ioctls

Signed-off-by: Dima Zavin <dima@android.com>

pmem: Expose is_pmem_file to the in-kernel users.

Signed-off-by: Dima Zavin <dima@android.com>

pmem: Make the exposed functions be noops if CONFIG_ANDROID_PMEM is not set.

Signed-off-by: Dima Zavin <dima@android.com>

misc: pmem: don't flush if file was opened with O_SYNC

Change-Id: I067218658a0d7f7ecc1fe73e9ff6b0c3b3054653
Signed-off-by: Dima Zavin <dima@android.com>
2010-02-08 15:07:12 -08:00
Mike Lockwood
310504662c switch: switch class and GPIO drivers.
switch: Export symbol switch_set_state.

Signed-off-by: Mike Lockwood <lockwood@android.com>

switch: gpio: Don't call request_irq with interrupts disabled

Signed-off-by: Arve Hjønnevåg <arve@android.com>

switch: Use device_create instead of device_create_drvdata.

device_create_drvdata is obsolete.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

switch_gpio: Add missing #include <linux/interrupt.h>

Signed-off-by: Mike Lockwood <lockwood@android.com>
2010-02-03 21:27:14 -08:00
Arve Hjønnevåg
cd1ffe8d5d power_supply: Hold a wake_lock while power supply change notifications are pending
When connecting usb or the charger the device would often go back to sleep
before the charge led and screen turned on.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:13 -08:00
Arve Hjønnevåg
d1d79061a9 rtc: alarm: Add in-kernel alarm interface
Drivers can now create alarms that will use an hrtimer while the
system is running and the rtc to wake up from suspend.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03 21:27:09 -08:00
Arve Hjønnevåg
b76a757d7b rtc: Add android alarm driver. 2010-02-03 21:27:09 -08:00
Arve Hjønnevåg
e86d97ab02 PM: Add early suspend api. 2010-02-03 21:27:01 -08:00
Arve Hjønnevåg
046ebf066d PM: Add wake lock api. 2010-02-03 21:27:01 -08:00
Linus Torvalds
56f3f55cf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Correct WM831X_MAX_ISEL_VALUE
2009-12-02 15:41:49 -08:00
David Howells
f13a48bd79 SLOW_WORK: Move slow_work's proc file to debugfs
Move slow_work's debugging proc file to debugfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Requested-and-acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 08:20:31 -08:00
Mark Brown
7716977b6a mfd: Correct WM831X_MAX_ISEL_VALUE
There was confusion between the array size and the highest ISEL
value possible.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-01 11:24:19 +01:00
Linus Torvalds
6e80133f7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...
2009-11-30 13:33:48 -08:00
Kevin Wells
4ced24c897 i2c: i2c-pnx: Made buf type unsigned to prevent sign extension
Made buf type unsigned to prevent sign extension

Signed-off-by: Kevin Wells <kevin.wells@nxp.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2009-11-20 00:25:42 +00:00
Alan Cox
308efab5e2 vt: Fix use of "new" in a struct field
As this struct is exposed to user space and the API was added for this
release it's a bit of a pain for the C++ world and we still have time to
fix it. Rename the fields before we end up with that pain in an actual
release.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reported-by: Olivier Goffart
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-19 13:43:06 -08:00
David Howells
fee096deb4 CacheFiles: Catch an overly long wait for an old active object
Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [<ffffffff81353153>] __wait_on_bit+0x43/0x76
	 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270
	 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e
	 [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:12:05 +00:00
David Howells
a17754fb8c CacheFiles: Don't write a full page if there's only a partial page to cache
cachefiles_write_page() writes a full page to the backing file for the last
page of the netfs file, even if the netfs file's last page is only a partial
page.

This causes the EOF on the backing file to be extended beyond the EOF of the
netfs, and thus the backing file will be truncated by cachefiles_attr_changed()
called from cachefiles_lookup_object().

So we need to limit the write we make to the backing file on that last page
such that it doesn't push the EOF too far.

Also, if a backing file that has a partial page at the end is expanded, we
discard the partial page and refetch it on the basis that we then have a hole
in the file with invalid data, and should the power go out...  A better way to
deal with this could be to record a note that the partial page contains invalid
data until the correct data is written into it.

This isn't a problem for netfs's that discard the whole backing file if the
file size changes (such as NFS).

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:52 +00:00
David Howells
60d543ca72 FS-Cache: Start processing an object's operations on that object's death
Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:45 +00:00
David Howells
201a15428b FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:35 +00:00
David Howells
1bccf513ac FS-Cache: Fix lock misorder in fscache_write_op()
FS-Cache has two structs internally for keeping track of the internal state of
a cached file: the fscache_cookie struct, which represents the netfs's state,
and fscache_object struct, which represents the cache's state.  Each has a
pointer that points to the other (when both are in existence), and each has a
spinlock for pointer maintenance.

Since netfs operations approach these structures from the cookie side, they get
the cookie lock first, then the object lock.  Cache operations, on the other
hand, approach from the object side, and get the object lock first.  It is not
then permitted for a cache operation to get the cookie lock whilst it is
holding the object lock lest deadlock occur; instead, it must do one of two
things:

 (1) increment the cookie usage counter, drop the object lock and then get both
     locks in order, or

 (2) simply hold the object lock as certain parts of the cookie may not be
     altered whilst the object lock is held.

It is also not permitted to follow either pointer without holding the lock at
the end you start with.  To break the pointers between the cookie and the
object, both locks must be held.

fscache_write_op(), however, violates the locking rules: It attempts to get the
cookie lock without (a) checking that the cookie pointer is a valid pointer,
and (b) holding the object lock to protect the cookie pointer whilst it follows
it.  This is so that it can access the pending page store tree without
interference from __fscache_write_page().

This is fixed by splitting the cookie lock, such that the page store tracking
tree is protected by its own lock, and checking that the cookie pointer is
non-NULL before we attempt to follow it whilst holding the object lock.

The new lock is subordinate to both the cookie lock and the object lock, and so
should be taken after those.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:25 +00:00
David Howells
4fbf4291aa FS-Cache: Allow the current state of all objects to be dumped
Allow the current state of all fscache objects to be dumped by doing:

	cat /proc/fs/fscache/objects

By default, all objects and all fields will be shown.  This can be restricted
by adding a suitable key to one of the caller's keyrings (such as the session
keyring):

	keyctl add user fscache:objlist "<restrictions>" @s

The <restrictions> are:

	K	Show hexdump of object key (don't show if not given)
	A	Show hexdump of object aux data (don't show if not given)

And paired restrictions:

	C	Show objects that have a cookie
	c	Show objects that don't have a cookie
	B	Show objects that are busy
	b	Show objects that aren't busy
	W	Show objects that have pending writes
	w	Show objects that don't have pending writes
	R	Show objects that have outstanding reads
	r	Show objects that don't have outstanding reads
	S	Show objects that have slow work queued
	s	Show objects that don't have slow work queued

If neither side of a restriction pair is given, then both are implied.  For
example:

	keyctl add user fscache:objlist KB @s

shows objects that are busy, and lists their object keys, but does not dump
their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
not implied.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:04 +00:00
David Howells
440f0affe2 FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work items
Annotate slow-work runqueue proc lines for FS-Cache work items.  Objects
include the object ID and the state.  Operations include the object ID, the
operation ID and the operation type and state.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:01 +00:00
David Howells
3bde31a4ac SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed
Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.

Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.

In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.

However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.

So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().

This function will then sleep until either the item's event occurs or another
work item appears on the queue.  If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.

This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind.  As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:57 +00:00
David Howells
31ba99d304 SLOW_WORK: Allow the owner of a work item to determine if it is queued or not
Add a function (slow_work_is_queued()) to permit the owner of a work item to
determine if the item is queued or not.

The work item is counted as being queued if it is actually on the queue, not
just if it is pending.  If it is executing and pending, then it is not on the
queue, but will rather be put back on the queue when execution finishes.

This permits a caller to quickly work out if it may be able to put another,
dependent work item on the queue behind it, or whether it will have to wait
till that is finished.

This can be used by CacheFiles to work out whether the creation a new object
can be immediately deferred when it has to wait for an old object to be
deleted, or whether a wait must take place.  If a wait is necessary, then the
slow-work thread can otherwise get blocked, preventing the deletion from
taking place.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:53 +00:00
David Howells
8fba10a42d SLOW_WORK: Allow the work items to be viewed through a /proc file
Allow the executing and queued work items to be viewed through a /proc file
for debugging purposes.  The contents look something like the following:

    THR PID   ITEM ADDR        FL MARK  DESC
    === ===== ================ == ===== ==========
      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK

In the 'THR' column, executing items show the thread they're occupying and
queued threads indicate which queue they're on.  'PID' shows the process ID of
a slow-work thread that's executing something.  'FL' shows the work item flags.
'MARK' indicates how long since an item was queued or began executing.  Lastly,
the 'DESC' column permits the owner of an item to give some information.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:51 +00:00
Jens Axboe
6b8268b17a SLOW_WORK: Add delayed_slow_work support
This adds support for starting slow work with a delay, similar
to the functionality we have for workqueues.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:47 +00:00
Jens Axboe
0160950297 SLOW_WORK: Add support for cancellation of slow work
Add support for cancellation of queued slow work and delayed slow work items.
The cancellation functions will wait for items that are pending or undergoing
execution to be discarded by the slow work facility.

Attempting to enqueue work that is in the process of being cancelled will
result in ECANCELED.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:43 +00:00
David Howells
3d7a641e54 SLOW_WORK: Wait for outstanding work items belonging to a module to clear
Wait for outstanding slow work items belonging to a module to clear when
unregistering that module as a user of the facility.  This prevents the put_ref
code of a work item from being taken away before it returns.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:23 +00:00