Commit graph

319273 commits

Author SHA1 Message Date
Josef Bacik
c5c3c5f31e Btrfs: remove ->dirty_inode
We do all of our inode updating when we change it, and now that we do
->update_time we don't need ->dirty_inode for atime updates anymore, so just
remove it.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-07-23 15:41:38 -04:00
Chris Mason
cbea5ac1ee Btrfs: reduce calls to wake_up on uncontended locks
The btrfs locks were unconditionally calling wake_up as the
locks were released.  This lead to extra thrashing on the waitqueue,
especially for locks that were dominated by readers.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-23 15:36:18 -04:00
Chris Mason
e39e64ac0c Btrfs: don't wait around for new log writers on an SSD
Waiting on spindles improves performance, but ssds want all the
IO as quickly as we can push it down.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-23 15:36:17 -04:00
Linus Torvalds
a66d2c8f7e Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull the big VFS changes from Al Viro:
 "This one is *big* and changes quite a few things around VFS.  What's in there:

   - the first of two really major architecture changes - death to open
     intents.

     The former is finally there; it was very long in making, but with
     Miklos getting through really hard and messy final push in
     fs/namei.c, we finally have it.  Unlike his variant, this one
     doesn't introduce struct opendata; what we have instead is
     ->atomic_open() taking preallocated struct file * and passing
     everything via its fields.

     Instead of returning struct file *, it returns -E...  on error, 0
     on success and 1 in "deal with it yourself" case (e.g.  symlink
     found on server, etc.).

     See comments before fs/namei.c:atomic_open().  That made a lot of
     goodies finally possible and quite a few are in that pile:
     ->lookup(), ->d_revalidate() and ->create() do not get struct
     nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
     flags instead, ->create() gets "do we want it exclusive" flag.

     With the introduction of new helper (kern_path_locked()) we are rid
     of all struct nameidata instances outside of fs/namei.c; it's still
     visible in namei.h, but not for long.  Come the next cycle,
     declaration will move either to fs/internal.h or to fs/namei.c
     itself.  [me, miklos, hch]

   - The second major change: behaviour of final fput().  Now we have
     __fput() done without any locks held by caller *and* not from deep
     in call stack.

     That obviously lifts a lot of constraints on the locking in there.
     Moreover, it's legal now to call fput() from atomic contexts (which
     has immediately simplified life for aio.c).  We also don't need
     anti-recursion logics in __scm_destroy() anymore.

     There is a price, though - the damn thing has become partially
     asynchronous.  For fput() from normal process we are guaranteed
     that pending __fput() will be done before the caller returns to
     userland, exits or gets stopped for ptrace.

     For kernel threads and atomic contexts it's done via
     schedule_work(), so theoretically we might need a way to make sure
     it's finished; so far only one such place had been found, but there
     might be more.

     There's flush_delayed_fput() (do all pending __fput()) and there's
     __fput_sync() (fput() analog doing __fput() immediately).  I hope
     we won't need them often; see warnings in fs/file_table.c for
     details.  [me, based on task_work series from Oleg merged last
     cycle]

   - sync series from Jan

   - large part of "death to sync_supers()" work from Artem; the only
     bits missing here are exofs and ext4 ones.  As far as I understand,
     those are going via the exofs and ext4 trees resp.; once they are
     in, we can put ->write_super() to the rest, along with the thread
     calling it.

   - preparatory bits from unionmount series (from dhowells).

   - assorted cleanups and fixes all over the place, as usual.

  This is not the last pile for this cycle; there's at least jlayton's
  ESTALE work and fsfreeze series (the latter - in dire need of fixes,
  so I'm not sure it'll make the cut this cycle).  I'll probably throw
  symlink/hardlink restrictions stuff from Kees into the next pile, too.
  Plus there's a lot of misc patches I hadn't thrown into that one -
  it's large enough as it is..."

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
  ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
  switch dentry_open() to struct path, make it grab references itself
  spufs: shift dget/mntget towards dentry_open()
  zoran: don't bother with struct file * in zoran_map
  ecryptfs: don't reinvent the wheels, please - use struct completion
  don't expose I_NEW inodes via dentry->d_inode
  tidy up namei.c a bit
  unobfuscate follow_up() a bit
  ext3: pass custom EOF to generic_file_llseek_size()
  ext4: use core vfs llseek code for dir seeks
  vfs: allow custom EOF in generic_file_llseek code
  vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  vfs: Remove unnecessary flushing of block devices
  vfs: Make sys_sync writeout also block device inodes
  vfs: Create function for iterating over block devices
  vfs: Reorder operations during sys_sync
  quota: Move quota syncing to ->sync_fs method
  quota: Split dquot_quota_sync() to writeback and cache flushing part
  vfs: Move noop_backing_dev_info check from sync into writeback
  ...
2012-07-23 12:27:27 -07:00
Wanpeng Li
1d00015e26 mm/frontswap: cleanup doc and comment error
Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-23 11:16:20 -04:00
Sasha Levin
3389b530a6 mm: frontswap: remove unneeded headers
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[v1: Rebased with tracing removed]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-23 11:16:13 -04:00
Michael Walle
8ceffa7c4a spi/orion: remove uneeded spi_info
This was formerly used to store the tclk value. This is now discovered
using the clk API, rather than pass it as platform data.

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@googlemail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-07-23 14:14:54 +01:00
Florian Fainelli
d76ea24ac4 spi/bcm63xx: fix clock configuration selection
We are currently using an inferior or equal operator for comparing
the transfer frequency with the clock frequency table. Because of
this, we always end up selecting 20Mhz as a frequency, due to the
inequality transfer hz <= 20 Mhz being always true. Fix this by
reversing the inequality, which is how the comparison should be done.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-07-23 14:14:11 +01:00
Takashi Iwai
c1b623d9e4 ASoC: Additional updates for 3.6
A few more fixes for 3.6, some of which are relatively important -
 they've all been in -next for at least some time.
 
 - DAPM fixes for the recent locking changes.
 - Fix for _PRE and _POST widgets (which have been broken for a few
   releases now).
 - A couple of minor driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDSB4AAoJEBus8iNuMP3d7rYP/in6d3kH48tNMZU1K6YJi2fi
 5f7pmq7RAH86Py5jHD1m0/aGeXzltJqF2UFgOQGoh0JLsM3bnJtAXV4uCqxBc8wr
 wNfueXV5az3eZtDcZKzX0ZmgqZV06/MugDzP/P+roZB7cAH8bHA2yBajgiNLIYpB
 7DD4HMOQcHtVPeM4ioHsFq3xXIiiZ/ZbJ353gKluYgUhHFjzptMSjz/1eE3gv5RU
 CY0LQe36nfJH7cbmZseftnwf+BCWZoWxgEkjM0dcHs+uw1xe1DjpH9zVvyl1MpnS
 1Kgl6+SiVHmqmUEyS8pU1PsMo7ssrLUh4sNxA/FIJQEdl8x/ykraOILU9tdGX3gS
 HLi7okK6p0sMoPlTBayoqY/EgFI9NL6e3xG9jSVICareSqh0Zd32bCZ7bOaQC7G4
 Zd9i3Mkr3ce1OqTuiaS+CMT+j6KH7w54X/lPP8zQhT/f0sF9fZ8Y9VXwHQhe5pFD
 B4BJDk8nPB5Bc1izM/dRWF/Y3NP0O33PB3tFHLNZVPXpeuk8TUP6YLr51UYwxmip
 /++UAIwJFv0EvZ4772raIOGwQcr8ypPdClp17Uj7xqVX68wcueFH8uiLzLkJRbF0
 wB0BbDj+IQeU/fwN216hvS44qteJD0XrSv54jd0C6jDqxz6bj53kt6PvttSgjvZq
 tj5BifekUSCK3TQyN6LN
 =47kf
 -----END PGP SIGNATURE-----

Merge tag 'asoc-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next

ASoC: Additional updates for 3.6

A few more fixes for 3.6, some of which are relatively important -
they've all been in -next for at least some time.

- DAPM fixes for the recent locking changes.
- Fix for _PRE and _POST widgets (which have been broken for a few
  releases now).
- A couple of minor driver updates.
2012-07-23 14:34:42 +02:00
Axel Lin
0dd6e4847e watchdog: orion_wdt: Convert driver to watchdog core
Convert orion_wdt driver to use watchdog framework API.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:51:09 +02:00
Sachin Kamat
6b761b2902 watchdog: s3c2410_wdt: Use module_platform_driver()
module_platform_driver() replaces module_init() and module_exit()
and makes the code simpler.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:50:51 +02:00
Wim Van Sebroeck
7732c6b96f watchdog: sch311x_wdt: Fix Polarity when starting watchdog
Some motherboards like the Advantech ARK3400 documentation
use a non-inverted GPIO pin. We fix this by assuming that
the BIOS will set the Polarity bit for the GPIO correctly
at startup and we keep the Bit-setting intact when we start
and stop the watchdog.

Reported-by: Jean-François Deverge <jf.deverge@gmail.com>
Signed-off-by: Dave Mueller <d.mueller@elsoft.ch>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:50:30 +02:00
Lokesh Vutla
41814eed41 Watchdog: OMAP: Fix the runtime pm code to avoid module getting stuck intransition state.
OMAP watchdog driver is adapted to runtime PM like a general device
driver but it is not appropriate. It is causing couple of functional
issues.

1. On OMAP4 SYSCLK can't be gated, because of issue with WDTIMER2 module,
which constantly stays in "in transition" state. Value of register
CM_WKUP_WDTIMER2_CLKCTRL is always 0x00010000 in this case.
Issue occurs immediately after first idle, when hwmod framework tries
to disable WDTIMER2 functional clock - "wd_timer2_fck". After this
module falls to "in transition" state, and SYSCLK gating is blocked.

2. Due to runtime PM, watchdog timer may be completely disabled.
In current code base watchdog timer is not disabled only because of
issue 1. Otherwise state of WDTIMER2 module will be "Disabled", and there
will be no interrupts from omap_wdt. In other words watchdog will not
work at all.

Watchdong is a special IP and it should not be disabled otherwise
purpose of it itself is defeated. Watchdog functional clock should
never be disabled. This patch updates the runtime PM handling in
driver so that runtime PM is limited only during probe/shutdown
and suspend/resume.

The patch fixes issue 1 and 2

Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:50:11 +02:00
Gerard Snitselaar
0402450f45 watchdog: ie6xx_wdt: section mismatch in ie6xx_wdt_probe()
ie6xx_wdt_probe() calls ie6xx_wdt_debugfs_exit() as part of
it's error cleanup path, and ie6xx_wdt_debugfs_exit() is
currently annotated __devexit.

Signed-off-by: Gerard Snitselaar <dev@snitselaar.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:49:44 +02:00
Florian Fainelli
5a135f3c72 watchdog: bcm63xx_wdt: fix driver section mismatch
bcm63xx_wdt was used as a platform_driver but was not suffixed with
_driver, thus causing section mismatches, fix that.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:49:24 +02:00
Wim Van Sebroeck
bff23431fe watchdog: iTCO_wdt.c: convert to watchdog core
This patch converts the iTCO_wdt watchdog driver to use the
generic watchdog framework.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:48:41 +02:00
Oskar Schirmer
18cb2ae55f char/ipmi: remove local ioctl defines replaced by generic ones
This watchdog driver had ioctl defines introduced locally
for pre timeout handling, marked to be removed as soon as
a generic replacement would become available.

The latter has actually occurred in 2006, at e05b59fe.

Remove the local duplicates for pre timeout handling.

Signed-off-by: Oskar Schirmer <oskar@scara.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2012-07-23 12:48:04 +02:00
Michal Simek
90fe6c608f watchdog: xilinx: Read clock frequency directly from DT node
Do not use clock-frequency property from parent node.
Use it from watchdog node.

Signed-off-by: Michal Simek <monstr@monstr.eu>
Acked-By: Alejandro Cabrera <acabrera@udio.cujae.edu.cu>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:47:00 +02:00
Linus Walleij
c362cb597b watchdog: coh901327_wdt: use clk_prepare/unprepare
Make sure we prepare/unprepare the COH901327 watchdog timer
as is required by the clk API especially if you use common
clock.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by : Pankaj Jangra <jangra.pankaj9@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:46:49 +02:00
Justin Wheeler
3017020dc7 watchdog: f71808e_wdt: Add support for Jetway JNF99 motherboard
The Jetway JNF99 motherboard features a F71869 SuperIO chip, but its
watchdog chipset ID appears to be 1007 (as opposed to 0814).  Some testing
confirmed it behaves the exact same as 0814. So add this chipset ID to the
module's ID list so that the Fintek watchdog driver can correctly identify
and access it.

Signed-off-by: Justin Wheeler <jwheeler@datademons.com>
Acked-by: Giel van Schijndel <me@mortis.eu>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2012-07-23 12:46:38 +02:00
Andrew Lunn
f814f9ac5a spi/orion: add device tree binding
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@googlemail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-07-23 11:43:39 +01:00
Joerg Roedel
395e51f18d Merge branches 'iommu/fixes', 'x86/amd', 'groups', 'arm/tegra' and 'api/domain-attr' into next
Conflicts:
	drivers/iommu/iommu.c
	include/linux/iommu.h
2012-07-23 12:17:00 +02:00
Cyrus Lien
2d8767bb42 HID: add ASUS AIO keyboard model AK1D
Add Asus All-In-One PC keyboard model AK1D.

BugLink: https://bugs.launchpad.net/bugs/1027789

Signed-off-by: Cyrus Lien <cyrus.lien@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-23 12:10:21 +02:00
Mark Brown
15d47763b3 Merge branch 'for-3.5' into for-3.6 2012-07-23 10:45:07 +01:00
Mark Brown
0ff97ebf08 ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements
Ever since the DAPM performance improvements we've been marking all widgets
as not dirty after each DAPM run. Since _PRE and _POST events aren't part
of the DAPM graph this has rendered them non-functional, they will never be
marked dirty again and thus will never be run again.

Fix this by skipping them when marking widgets as not dirty.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Cc: stable@vger.kernel.org
2012-07-23 10:39:54 +01:00
Weiping Pan
06b6a1cf6e rds: set correct msg_namelen
Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
	int sock_fd;
	struct sockaddr_in serverAddr;
	struct sockaddr_in toAddr;
	char recvBuffer[128] = "data from client";
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if (sock_fd < 0) {
		perror("create socket error\n");
		exit(1);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4001);

	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind() error\n");
		close(sock_fd);
		exit(1);
	}

	memset(&toAddr, 0, sizeof(toAddr));
	toAddr.sin_family = AF_INET;
	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	toAddr.sin_port = htons(4000);
	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	if (sendmsg(sock_fd, &msg, 0) == -1) {
		perror("sendto() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("client send data:%s\n", recvBuffer);

	memset(recvBuffer, '\0', 128);

	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;
	if (recvmsg(sock_fd, &msg, 0) == -1) {
		perror("recvmsg() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("receive data from server:%s\n", recvBuffer);

	close(sock_fd);

	return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
	struct sockaddr_in fromAddr;
	int sock_fd;
	struct sockaddr_in serverAddr;
	unsigned int addrLen;
	char recvBuffer[128];
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if(sock_fd < 0) {
		perror("create socket error\n");
		exit(0);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4000);
	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind error\n");
		close(sock_fd);
		exit(1);
	}

	printf("server is waiting to receive data...\n");
	msg.msg_name = &fromAddr;

	/*
	 * I add 16 to sizeof(fromAddr), ie 32,
	 * and pay attention to the definition of fromAddr,
	 * recvmsg() will overwrite sock_fd,
	 * since kernel will copy 32 bytes to userspace.
	 *
	 * If you just use sizeof(fromAddr), it works fine.
	 * */
	msg.msg_namelen = sizeof(fromAddr) + 16;
	/* msg.msg_namelen = sizeof(fromAddr); */
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	while (1) {
		printf("old socket fd=%d\n", sock_fd);
		if (recvmsg(sock_fd, &msg, 0) == -1) {
			perror("recvmsg() error\n");
			close(sock_fd);
			exit(1);
		}
		printf("server received data from client:%s\n", recvBuffer);
		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
		printf("new socket fd=%d\n", sock_fd);
		strcat(recvBuffer, "--data from server");
		if (sendmsg(sock_fd, &msg, 0) == -1) {
			perror("sendmsg()\n");
			close(sock_fd);
			exit(1);
		}
	}

	close(sock_fd);
	return 0;
}

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 01:01:44 -07:00
Dan Carpenter
5b3e7e6cb5 openvswitch: potential NULL deref in sample()
If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and
it leads to a NULL dereference when we call nla_len(acts_list).  This
is a static checker fix, not something I have seen in testing.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 00:59:54 -07:00
Eric Dumazet
563d34d057 tcp: dont drop MTU reduction indications
ICMP messages generated in output path if frame length is bigger than
mtu are actually lost because socket is owned by user (doing the xmit)

One example is the ipgre_tunnel_xmit() calling
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

We had a similar case fixed in commit a34a101e1e (ipv6: disable GSO on
sockets hitting dst_allfrag).

Problem of such fix is that it relied on retransmit timers, so short tcp
sessions paid a too big latency increase price.

This patch uses the tcp_release_cb() infrastructure so that MTU
reduction messages (ICMP messages) are not lost, and no extra delay
is added in TCP transmits.

Reported-by: Maciej Żenczykowski <maze@google.com>
Diagnosed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 00:58:46 -07:00
Yuval Mintz
c3def943c7 bnx2x: Add new 57840 device IDs
The 57840 boards come in two flavours: 2 x 20G and 4 x 10G.
To better differentiate between the two flavours, a separate device ID
was assigned to each.
The silicon default value is still the currently supported 57840 device ID
(0x168d), and since a user can damage the nvram (e.g., 'ethtool -E')
the driver will still support this device ID to allow the user to amend the
nvram back into a supported configuration.

Notice this patch contains lines longer than 80 characters (strings).

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 00:58:21 -07:00
Julian Anastasov
9a0a9502cb tcp: avoid oops in tcp_metrics and reset tcpm_stamp
In tcp_tw_remember_stamp we incorrectly checked tw
instead of tm, it can lead to oops if the cached entry is
not found.

	tcpm_stamp was not updated in tcpm_check_stamp when
tcpm_suck_dst was called, move the update into tcpm_suck_dst,
so that we do not call it infinitely on every next cache hit
after TCP_METRICS_TIMEOUT.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 00:57:12 -07:00
Shuah Khan
9b70749e64 niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return
value to be consistent with the rest of the checks after niu_rbr_add_page()
calls in this file.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 23:31:07 -07:00
Shuah Khan
ec2deec1f3 niu: Fix to check for dma mapping errors.
Fix Neptune ethernet driver to check dma mapping error after map_page()
interface returns.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 23:31:06 -07:00
Roland Dreier
089117e1ad Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus 2012-07-22 23:26:17 -07:00
Cong Wang
61d06c83fc tile: remove usage of enum km_type
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-07-23 14:11:23 +08:00
Cong Wang
144cf8647a frv: remove the second parameter of kmap_atomic_primary()
All callers of kmap_atomic_primary() use __KM_CACHE, so it can be
removed safely, and __kmap_atomic_primary() only check if 'type' if
__KM_CACHE or not, so 'type' can be changed to a boolean as well.

Ditto for kunmap_atomic_primary()/__kunmap_atomic_primary().

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-07-23 14:11:22 +08:00
Cong Wang
906adea153 jbd2: remove the second argument of kmap_atomic
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-07-23 14:11:22 +08:00
Benjamin Herrenschmidt
574ce79cea powerpc/mpic: Create a revmap with enough entries for IPIs and timers
The current mpic code creates a linear revmap just big enough for all
the sources, which happens to miss the IPIs and timers on some machines.

This will in turn break when the irqdomain code loses the fallback of
doing a linear search when the revmap fails (and really slows down IPIs
otherwise).

This happens for example on the U4 based Apple machines such as the
dual core PowerMac G5s.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-23 14:20:42 +10:00
Theodore Ts'o
03179fe923 ext4: undo ext4_calc_metadata_amount if we fail to claim space
The function ext4_calc_metadata_amount() has side effects, although
it's not obvious from its function name.  So if we fail to claim
space, regardless of whether we retry to claim the space again, or
return an error, we need to undo these side effects.

Otherwise we can end up incorrectly calculating the number of metadata
blocks needed for the operation, which was responsible for an xfstests
failure for test #271 when using an ext2 file system with delalloc
enabled.

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-07-23 00:00:20 -04:00
Brian Foster
97795d2a5b ext4: don't let i_reserved_meta_blocks go negative
If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks.  In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair.  The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-07-22 23:59:40 -04:00
Ashish Sangwan
968dee7722 ext4: fix hole punch failure when depth is greater than 0
Whether to continue removing extents or not is decided by the return
value of function ext4_ext_more_to_rm() which checks 2 conditions:
a) if there are no more indexes to process.
b) if the number of entries are decreased in the header of "depth -1".

In case of hole punch, if the last block to be removed is not part of
the last extent index than this index will not be deleted, hence the
number of valid entries in the extent header of "depth - 1" will
remain as it is and ext4_ext_more_to_rm will return 0 although the
required blocks are not yet removed.

This patch fixes the above mentioned problem as instead of removing
the extents from the end of file, it starts removing the blocks from
the particular extent from which removing blocks is actually required
and continue backward until done.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
2012-07-22 22:49:08 -04:00
Jesper Juhl
818810472b net: Fix references to out-of-scope variables in put_cmsg_compat()
In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 17:50:49 -07:00
Artem Bityutskiy
b50924c2c6 ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
anymore and we can kill it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-07-22 20:37:31 -04:00
Artem Bityutskiy
4d47603d97 ext4: weed out ext4_write_super
We do not depend on VFS's '->write_super()' anymore and do not need
the 's_dirt' flag anymore, so weed out 'ext4_write_super()' and
's_dirt'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-07-22 20:35:31 -04:00
Artem Bityutskiy
58c5873a76 ext4: remove unnecessary superblock dirtying
This patch changes the 'ext4_handle_dirty_super()' function which
submits the superblock for I/O in the following cases:

1. When creating the first large file on a file system without
   EXT4_FEATURE_RO_COMPAT_LARGE_FILE feature.
2. When re-sizing the file-system.
3. When creating an xattr on a file-system without the
   EXT4_FEATURE_COMPAT_EXT_ATTR feature.

If the file-system has journal enabled, the superblock is written via
the journal. We do not modify this path.

If the file-system has no journal, this function, falls back to just
marking the superblock as dirty using the 's_dirt' superblock
flag. This means that it delays the actual superblock I/O submission
by 5 seconds (default setting).  Namely, the 'sync_supers()' kernel
thread will call 'ext4_write_super()' later and will actually submit
the superblock for I/O.

And this is the behavior this patch modifies: we stop using 's_dirt'
and just mark the superblock buffer as dirty right away. Indeed, all 3
cases above are extremely rare and it does not add any value to delay
the I/O submission for them.

Note: 'ext4_handle_dirty_super()' executes
'__ext4_handle_dirty_super()' with 'now = 0'. This patch basically
makes the 'now' argument unneeded and it will be deleted in one of the
next patches.

This patch also removes 's_dirt' condition on the unmount path because
we never set it anymore, so we should not test it.

Tested using xfstests for both journalled and non-journalled ext4.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-07-22 20:33:31 -04:00
Jan Kara
044ce47fec ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
The last user of ext4_mark_super_dirty() in ext4_file_open() is so
rare it can well be modifying the superblock properly by journalling
the change.  Change it and get rid of ext4_mark_super_dirty() as it's
not needed anymore.

Artem: small amendments.
Artem: tested using xfstests for both journalled and non-journalled ext4.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-07-22 20:31:31 -04:00
Jan Kara
97a7406880 ext4: remove useless marking of superblock dirty
Commit a0375156 properly notes that superblock doesn't need to be marked
as dirty when only number of free inodes / blocks / number of directories
changes since that is recomputed on each mount anyway. However that comment
leaves some unnecessary markings as dirty in place. Remove these.

Artem: tested using xfstests for both journalled and non-journalled ext4.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-07-22 20:29:31 -04:00
Al Viro
254706056b ext4: fix ext4 mismerge back in January
Duplicate caused, AFAICS, by mismerge in
ff9cb1c4ee>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-07-22 20:27:31 -04:00
Theodore Ts'o
3108b54bce ext4: remove dynamic array size in ext4_chksum()
The ext4_checksum() inline function was using a dynamic array size,
which is not legal C.  (It is a gcc extension).

Remove it.

Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-22 20:25:31 -04:00
Theodore Ts'o
8a9918497b ext4: remove unused variable in ext4_update_super()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-22 20:23:31 -04:00
Aditya Kali
7c319d3285 ext4: make quota as first class supported feature
This patch adds support for quotas as a first class feature in ext4;
which is to say, the quota files are stored in hidden inodes as file
system metadata, instead of as separate files visible in the file system
directory hierarchy.

It is based on the proposal at:                                                                                                           
https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4

This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA
which, when turned on, enables quota accounting at mount time
iteself. Also, the quota inodes are stored in two additional superblock
fields.  Some changes introduced by this patch that should be pointed
out are:

1) Two new ext4-superblock fields - s_usr_quota_inum and
   s_grp_quota_inum for storing the quota inodes in use.
2) Default quota inodes are: inode#3 for tracking userquota and inode#4
   for tracking group quota. The superblock fields can be set to use
   other inodes as well.
3) If the QUOTA feature and corresponding quota inodes are set in
   superblock, the quota usage tracking is turned on at mount time. On
   'quotaon' ioctl, the quota limits enforcement is turned
   on. 'quotaoff' ioctl turns off only the limits enforcement in this
   case.
4) When QUOTA feature is in use, the quota mount options 'quota',
   'usrquota', 'grpquota' are ignored by the kernel.
5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize
   quota inodes. The default reserved inodes will not be visible to user
   as regular files.
6) The quota-tools will need to be modified to support hidden quota
   files on ext4. E2fsprogs will also include support for creating and
   fixing quota files.
7) Support is only for the new V2 quota file format.

Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-22 20:21:31 -04:00