Commit graph

315824 commits

Author SHA1 Message Date
David S. Miller
4fd551d7be ipv4: Kill rt->rt_oif
Never actually used.

It was being set on output routes to the original OIF specified in the
flow key used for the lookup.

Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:38:34 -07:00
David S. Miller
93ac53410a ipv4: Dirty less cache lines in route caching paths.
Don't bother incrementing dst->__use and setting dst->lastuse,
they are completely pointless and just slow things down.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:55 -07:00
David S. Miller
ba3f7f04ef ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:54 -07:00
David S. Miller
d2d68ba9fe ipv4: Cache input routes in fib_info nexthops.
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:40 -07:00
David S. Miller
f2bb4bedf3 ipv4: Cache output routes in fib_info nexthops.
If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:16 -07:00
David S. Miller
ceb3320610 ipv4: Kill routes during PMTU/redirect updates.
Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:22 -07:00
David S. Miller
f5b0a87436 net: Document dst->obsolete better.
Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:21 -07:00
David S. Miller
f8126f1d51 ipv4: Adjust semantics of rt->rt_gateway.
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-07-20 13:31:20 -07:00
David S. Miller
f1ce3062c5 ipv4: Remove 'rt_dst' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:19 -07:00
David Miller
b48698895d ipv4: Remove 'rt_mark' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:18 -07:00
David Miller
d6c0a4f609 ipv4: Kill 'rt_src' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:00 -07:00
David Miller
1a00fee4ff ipv4: Remove rt_key_{src,dst,tos} from struct rtable.
They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David Miller
38a424e465 ipv4: Kill ip_route_input_noref().
The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David S. Miller
89aef8921b ipv4: Delete routing cache.
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:27 -07:00
Rafael J. Wysocki
75a4161a58 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch
  EXYNOS: bugfix on retrieving old_index from freqs.old
2012-07-20 21:39:50 +02:00
Stephen Boyd
a914443627 cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_setspeed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 #37 Tainted: G        W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
 (s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
 (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(s_active#13);
  lock(s_active#13);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by filemonkey/122:
 #0:  (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
 #1:  (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly calls
kobject_get() via cpufreq_cpu_get() and is the last one to call
kobject_put() via cpufreq_cpu_put(). Sysfs code should not call
kobject_get() or kobject_put() directly (see the comment around
sysfs_schedule_callback() for more information).

Fix this deadlock by introducing two new functions:

	struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu)
	void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data)

which do the same thing as cpufreq_cpu_{get,put}() but don't call
kobject functions.

To easily trigger this deadlock you can insert an msleep() with a
reasonably large value right after the fail label at the bottom
of the store() function in cpufreq.c and then write
scaling_setspeed in one task and offline the cpu in another. The
first task will hang and be detected by the hung task detector.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-20 21:39:25 +02:00
Linus Torvalds
d75e2c9ad9 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull late MIPS fixes from Ralf Baechle:
 "This fixes a number of lose ends in the MIPS code and various bug
  fixes.

  Aside of dropping some patch that should not be in this pull request
  everything has sat in -next for quite a while and there are no known
  issues.

  The biggest patch in this patch set moves the allocation of an array
  that is aliased to a function (for runtime generated code) to
  assembler code.  This avoids an issue with certain toolchains when
  building for microMIPS."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
  MIPS: PCI: Move fixups from __init to __devinit.
  MIPS: Fix bug.h MIPS build regression
  MIPS: sync-r4k: remove redundant irq operation
  MIPS: smp: Warn on too early irq enable
  MIPS: call set_cpu_online() on cpu being brought up with irq disabled
  MIPS: call ->smp_finish() a little late
  MIPS: Yosemite: delay irq enable to ->smp_finish()
  MIPS: SMTC: delay irq enable to ->smp_finish()
  MIPS: BMIPS: delay irq enable to ->smp_finish()
  MIPS: Octeon: delay enable irq to ->smp_finish()
  MIPS: Oprofile: Fix build as a module.
  MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
  MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
  MIPS: Fix Magic SysRq L kernel crash.
  MIPS: BMIPS: Fix duplicate header inclusion.
  mips: mark const init data with __initconst instead of __initdata
  MIPS: cmpxchg.h: Add missing include
  MIPS: Malta may also be equipped with MIPS64 R2 processors.
  MIPS: Fix typo multipy -> multiply
  MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
  ...
2012-07-20 12:02:02 -07:00
Linus Torvalds
935173744a Three fixes for device-mapper discard processing:
- avoid a crash in dm-raid1 when discards coincide with mirror recovery;
   - avoid discarding shared data that's still needed in dm-thin;
   - don't guarantee that discarded blocks will be wiped in dm-raid1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQCV8qAAoJEK2W1qbAHj1niSAP/2K0RkgWvL0hwuaM+us0oh29
 XFou6Tb9pH+//QfKOJuClHeSfZFoHYuevvJPtwTqPlHGONE2YXeBtVmyp0k+BS69
 xoaQy+OoZFrEbhxyJFrg+lDcxVGRtvo7x9zegeRf++o/skRfRgAjzyLkI8bk4t3v
 c3vSDTVBikJXlTxa+J7EQpeW29DBiky+tIHQQx0+98u2VSlaFFP6MdLr1ROeq7yF
 +z3kEXk6qzwL9ZHTWuVCvhi7bw4i18UTrH0wxZuUXWRpz+Va5h7w+/zcQbau6D/s
 K+BmlAW/fxzZOW4guFU6pCLlVGU4BsJxUXT55UaP4Dx9UuV59EtIPsDb8/Y/pGMX
 t9xnC4GmSOjw52pW2VR2gUJwG/c5mJ9g/mdP6twQzcC4JJ+CYg4Q5lH88qzDqceS
 VCrW681nIKIVoja5n1adv6gbZax8hlR/z8ElXrqELDmXk7nKBLOLdDVSXzZ9ceX1
 RnvtAZE/zrxcslKHw52Sd37c8YRer/fgx3kQxhXd1nb096DgiWvE/taD/ixjWHQX
 Eu1KrQIelvw63/BNNTKYRF7xS0dGKsGNaXWln7cMONG28CnrWG/8f+mp+KG73x5e
 Fc8yCONHNbqmf95yx1N0MgfYlZFjBBw0+BtqmR7QVcnG3r4SaSug+F72SPb5nN/B
 ZBmwNcSBaaC952+5pMZa
 =gbLp
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull device-mapper discard fixes from Alasdair G Kergon:
  - avoid a crash in dm-raid1 when discards coincide with mirror
    recovery;
  - avoid discarding shared data that's still needed in dm-thin;
  - don't guarantee that discarded blocks will be wiped in dm-raid1.

* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm raid1: set discard_zeroes_data_unsupported
  dm thin: do not send discards to shared blocks
  dm raid1: fix crash with mirror recovery and discard
2012-07-20 11:51:22 -07:00
Linus Torvalds
ce9f8d6b39 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
 "These are catastrophic fixes to the pnfs objects-layout that were just
  discovered.  They are also destined for @stable.

  I have found these and worked on them at around RC1 time but
  unfortunately went to the hospital for kidney stones and had a very
  slow recovery.  I refrained from sending them as is, before proper
  testing, and surly I have found a bug just yesterday.

  So now they are all well tested, and have my sign-off.  Other then
  fixing the problem at hand, and assuming there are no bugs at the new
  code, there is low risk to any surrounding code.  And in anyway they
  affect only these paths that are now broken.  That is RAID5 in pnfs
  objects-layout code.  It does also affect exofs (which was not broken)
  but I have tested exofs and it is lower priority then objects-layout
  because no one is using exofs, but objects-layout has lots of users."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO
2012-07-20 11:43:53 -07:00
Linus Torvalds
1793416287 Fix a bug in UBIFS free space fix-up reported already twice recently:
http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
 http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
 
 and we finally have the fix. I am quite confident the fix is correct
 because I could reproduce the problem with nandsim and verify the
 fix. It was also verified by Iwo (the reporter).
 
 I am also confident that this is OK to merge the fix so late because
 this patch affects only the fixup functionality, which is not used by
 most users.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQCQZ1AAoJECmIfjd9wqK0fosP/RD3Ruo5ILvTtBThKJPUoeld
 kihD9w3rk26cILlpGA3Cs/kaoOj/wPtjMVKGVkw50cWKRQemFLMh4ZcbCepfae+b
 g+YsH+ihkINgjdpKM351lgSCS+NEPJ695zmxNJ+/zjM5+ewfP6vK0qivnjF7w81k
 jLAVt80a1nhjNPyDMeQVr69HxBegYuX927LL4onJULYqvmrSiX/5tXzI+02emjDf
 9gA99fyc4pLNAJzzQyr44pogNaSME+Q90p4PAd11tlaVfn1kXgCXA3Ybv2cy7cer
 ipQfHQzfMjiCMO7Kpt5Ja3necuTarZsHV4UtmXhc4uIOr5p57dJX7RfBzA3j4RmV
 2ZFynqjl7n6ZT0pAM/0F9h9FyjZrCcgg1BGcEsqfJv2Yu7txOX1Qo2gkEvYJl8Sx
 Q2G6xNdzyib8MXClm4L2Zix16WqAF7CyUZo+szUTpdO8PPzgJ/vNpAk+3yqoVeep
 0Dr0HmTMRuP6tJGa9TH58QlvhClkXGSb7ukk1UlV4RVXtvvYtjVBwUXoHSUHNDJO
 HB9B+7ViTIjm9fdILqCX5wtrnZZQgFd1hBiQ/13/ZFrtB1hz5WfOdgfLRIBifjbq
 hGkwQyb5zsWTm7KGTOV0Yncmbnkut4zSJpMCbjZvcPJ2r5zwNwImKdvLBJ7oCKmd
 nPZ2dJmJYdKw2L00SzGZ
 =bUPG
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs

Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
 "It's been reported already twice recently:

    http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
    http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html

  and we finally have the fix.  I am quite confident the fix is correct
  because I could reproduce the problem with nandsim and verify the fix.
  It was also verified by Iwo (the reporter).

  I am also confident that this is OK to merge the fix so late because
  this patch affects only the fixup functionality, which is not used by
  most users."

* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
  UBIFS: fix a bug in empty space fix-up
2012-07-20 11:42:30 -07:00
Dan Carpenter
2962846d14 target: NULL dereference on error path
During a failure in transport_add_device_to_core_hba() code, we called
destroy_workqueue(dev->tmr_wq) before ->tmr_wq was allocated which leads
to an oops.

This fixes a regression introduced in with:

commit af8772926f
Author: Christoph Hellwig <hch@infradead.org>
Date:   Sun Jul 8 15:58:49 2012 -0400

    target: replace the processing thread with a TMR work queue

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-07-20 11:34:21 -07:00
Cloud Ren
fa0afcd109 atl1c: fix issue of io access mode for AR8152 v2.1
When io access mode is enabled by BOOTROM or BIOS for AR8152 v2.1,
the register can't be read/write by memory access mode.
Clearing Bit 8  of Register 0x21c could fixed the issue.

Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:21:18 -07:00
Mikulas Patocka
b09e786bd1 tun: fix a crash bug and a memory leak
This patch fixes a crash
tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
sock_release -> iput(SOCK_INODE(sock))
introduced by commit 1ab5ecb90c

The problem is that this socket is embedded in struct tun_struct, it has
no inode, iput is called on invalid inode, which modifies invalid memory
and optionally causes a crash.

sock_release also decrements sockets_in_use, this causes a bug that
"sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
creating and closing tun devices.

This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
sock_release to not free the inode and not decrement sockets_in_use,
fixing both memory corruption and sockets_in_use underflow.

It should be backported to 3.3 an 3.4 stabke.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:21:06 -07:00
Julian Anastasov
521f549097 ipv4: show pmtu in route list
Override the metrics with rt_pmtu

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:16:49 -07:00
David S. Miller
e4bce0f288 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jerr Kirsher says:

====================
This series contains updates to ixgbe.
 ...
Alexander Duyck (9):
  ixgbe: Use VMDq offset to indicate the default pool
  ixgbe: Fix memory leak when SR-IOV VFs are direct assigned
  ixgbe: Drop references to deprecated pci_ DMA api and instead use
    dma_ API
  ixgbe: Cleanup configuration of FCoE registers
  ixgbe: Merge all FCoE percpu values into a single structure
  ixgbe: Make FCoE allocation and configuration closer to how rings
    work
  ixgbe: Correctly set SAN MAC RAR pool to default pool of PF
  ixgbe: Only enable anti-spoof on VF pools
  ixgbe: Enable FCoE FSO and CRC offloads based on CAPABLE instead of
    ENABLED flag
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:11:59 -07:00
David S. Miller
aac3942ced Merge branch 'team_multiq'
Jiri Pirko says:

====================
This patchset represents the way I walked when I was adding multiqueue
support for team driver.

Jiri Pirko (6):
  net: honour netif_set_real_num_tx_queues() retval
  rtnl: allow to specify different num for rx and tx queue count
  rtnl: allow to specify number of rx and tx queues on device creation
  net: rename bond_queue_mapping to slave_dev_queue_mapping
  bond_sysfs: use ream_num_tx_queues rather than params.tx_queue
  team: add multiqueue support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:37 -07:00
Jiri Pirko
6c85f2bdda team: add multiqueue support
Largely copied from bonding code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko
8a540ff9e1 bond_sysfs: use real_num_tx_queues rather than params.tx_queue
Since now number of tx queues can be specified during bond instance
creation and therefore it may differ from params.tx_queues, use rather
real_num_tx_queues for boundary check.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko
df4ab5b3c2 net: rename bond_queue_mapping to slave_dev_queue_mapping
As this is going to be used not only by bonding.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko
76ff5cc919 rtnl: allow to specify number of rx and tx queues on device creation
This patch introduces IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES by
which userspace can set number of rx and/or tx queues to be allocated
for newly created netdevice.
This overrides ops->get_num_[tr]x_queues()

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko
d40156aa5e rtnl: allow to specify different num for rx and tx queue count
Also cut out unused function parameters and possible err in return
value.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:06:59 -07:00
Jiri Pirko
ee6ae1a1d5 net: honour netif_set_real_num_tx_queues() retval
In netif_copy_real_num_queues() the return value of
netif_set_real_num_tx_queues() should be checked.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:06:59 -07:00
Eric Dumazet
6f458dfb40 tcp: improve latencies of timer triggered events
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Eric Dumazet
9dc274151a tcp: fix ABC in tcp_slow_start()
When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()

Problem is this RFC 3465 statement is not correctly coded, as
the while () loop increases snd_cwnd one by one.

Add a new variable to avoid this off-by one error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Eric Dumazet
5815d5e7aa tcp: use hash_32() in tcp_metrics
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Vijay Subramanian
67b95bd78f tcp: Return bool instead of int where appropriate
Applied to a set of static inline functions in tcp_input.c

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Jon Mason
36e90319f3 ixgbe: use PCI_VENDOR_ID_INTEL
Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own
vendor ID #define.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Jon Mason
61dc53341d ixgb: use PCI_VENDOR_ID_INTEL
Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own
vendor ID #define.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Jon Mason
5e80bc783f myri10ge: update MAINTAINERS
Remove myself from myri10ge MAINTAINERS list

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
David S. Miller
54f0e9ba95 Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:

====================
the fifth pull request for upcoming v3.6 net-next cleans up and
improves the janz-ican3 driver (6 patches by Ira W. Snyder, one by me).
A patch by Steffen Trumtrar adds imx53 support to the flexcan driver.
And another patch by me, which marks the bit timing constant in the CAN
drivers as "const".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:56:03 -07:00
John W. Linville
90b90f60c4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
Ira W. Snyder
3b5c6b9e49 can: janz-ican3: add support for one shot mode
The Janz VMOD-ICAN3 hardware has support for one shot packet
transmission. This means that a packet will be attempted to be sent
once, with no automatic retries.

The SocketCAN core has a controller-wide setting for this mode:
CAN_CTRLMODE_ONE_SHOT. The Janz VMOD-ICAN3 hardware supports this flag
on a per-packet level, but the SocketCAN core does not.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20 17:49:05 +02:00
Ira W. Snyder
30df5888e4 can: janz-ican3: avoid firmware lockup caused by infinite bus error quota
If the bus error quota is set to infinite and the host CPU cannot keep
up, the Janz VMOD-ICAN3 firmware will stop responding to control
messages until the controller is reset.

The firmware will automatically stop sending bus error messages when the
quota is reached, and will only resume sending bus error messages when
the quota is re-set to a positive value.

This limitation is worked around by setting the bus error quota to one
message, and then re-setting the quota to one message every time a bus
error message is received. By doing this, the firmware never stops
responding to control messages. The CAN bus can be reset without a
hard-reset of the controller card.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20 17:49:05 +02:00
Ira W. Snyder
83702f6927 can: janz-ican3: fix support for CAN_RAW_RECV_OWN_MSGS
The Janz VMOD-ICAN3 firmware does not support any sort of TX-done
notification or interrupt. The driver previously used the hardware
loopback to attempt to work around this deficiency, but this caused all
sockets to receive all messages, even if CAN_RAW_RECV_OWN_MSGS is off.

Using the new function ican3_cmp_echo_skb(), we can drop the loopback
messages and return the original skbs. This fixes the issues with
CAN_RAW_RECV_OWN_MSGS.

A private skb queue is used to store the echo skbs. This avoids the need
for any index management.

Due to a lack of TX-error interrupts, bus errors are permanently
enabled, and are used as a TX-error notification. This is used to drop
an echo skb when transmission fails. Bus error packets are not generated
if the user has not enabled bus error reporting.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20 17:49:04 +02:00
Ira W. Snyder
88b587039c can: janz-ican3: fix error and byte counters
The error and byte counter statistics were being incremented
incorrectly. For example, a TX error would be counted both in tx_errors
and rx_errors.

This corrects the problem so that tx_errors and rx_errors are only
incremented for errors caused by packets sent to the bus. Error packets
generated by the driver are not counted.

The byte counters are only increased for packets which are actually
transmitted or received from the bus. Error packets generated by the
driver are not counted.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20 17:49:03 +02:00
Marc Kleine-Budde
9e4d6909a2 can: janz-ican3: cleanup of ican3_to_can_frame and can_frame_to_ican3
This patch cleans up the ICAN3 to Linux CAN frame and vice versa
conversion functions:

- RX: Use get_can_dlc() to limit the dlc value.
- RX+TX: Don't copy the whole frame, only copy the amount of bytes
  specified in cf->can_dlc.

Acked-by: Ira W. Snyder <iws@ovro.caltech.edu>
Tested-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20 17:48:53 +02:00
Mikulas Patocka
7c8d3a42fe dm raid1: set discard_zeroes_data_unsupported
We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard.  So this patch sets
ti->discard_zeroes_data_unsupported.

For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.

The flag was made available by commit 983c7db347
(dm crypt: always disable discard_zeroes_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20 14:25:07 +01:00
Mikulas Patocka
650d2a06b4 dm thin: do not send discards to shared blocks
When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.

This patch detects block sharing and ends the discard with success when
sending it to the shared block.

The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.

Thin target discard support with this bug arrived in commit
104655fd4d (dm thin: support discards).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20 14:25:05 +01:00
Mikulas Patocka
751f188dd5 dm raid1: fix crash with mirror recovery and discard
This patch fixes a crash when a discard request is sent during mirror
recovery.

Firstly, some background.  Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
  simultaneously recovered regions (by default the semaphore value is 1,
  so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
  __rh_recovery_prepare asks the log driver for the next region to
  recover. Then, it sets the region state to DM_RH_RECOVERING. If there
  are no pending I/Os on this region, the region is added to
  quiesced_regions list. If there are pending I/Os, the region is not
  added to any list. It is added to the quiesced_regions list later (by
  dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
  flight on this region. The region is popped from the list in
  dm_rh_recovery_start function. Then, a kcopyd job is started in the
  recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
  dm_rh_recovery_end. dm_rh_recovery_end adds the region to
  recovered_regions or failed_recovered_regions list (depending on
  whether the copy operation was successful or not).

The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.

Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.

There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.

These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb (dm raid1: support discard).

In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.

This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.

Reference: https://bugzilla.redhat.com/837607

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-20 14:25:03 +01:00
Alexandre Pereira da Silva
c49a183086 ARM: LPC32xx: Add PWM support
This SoC has two PWM channels

Signed-off-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
Signed-off-by: Roland Stigge <stigge@antcom.de>
2012-07-20 14:01:51 +02:00