Commit graph

2,232 commits

Author SHA1 Message Date
Cong Wang
aa5d265934 pktgen: correctly handle failures when adding a device
[ Upstream commit 604dfd6efc ]

The return value of pktgen_add_device() is not checked, so
even if we fail to add some device, for example, non-exist one,
we still see "OK:...". This patch fixes it.

After this patch, I got:

	# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
	-bash: echo: write error: No such device
	# cat /proc/net/pktgen/kpktgend_0
	Running:
	Stopped:
	Result: ERROR: can not add device non-exist
	# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
	# cat /proc/net/pktgen/kpktgend_0
	Running:
	Stopped: eth0
	Result: OK: add_device=eth0

(Candidate for -stable)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-14 10:47:34 -08:00
Eric Dumazet
b18401af7f rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
commit a4b64fbe48 upstream.

nlmsg_parse() might return an error, so test its return value before
potential random memory accesses.

Errors introduced in commit 115c9b8192 (rtnetlink: Fix problem with
buffer allocation)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 08:43:58 -08:00
Greg Rose
a0d3aa1f04 rtnetlink: Fix problem with buffer allocation
commit 115c9b8192 upstream.

Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.0:
 - Adjust context
 - Drop the change in do_setlink() that reverts commit f18da14565
   ('net: RTNETLINK adjusting values of min_ifinfo_dump_size'), which
   was never applied here]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 08:43:58 -08:00
Greg Rose
2e3cbdeae8 rtnetlink: Compute and store minimum ifinfo dump size
commit c7ac8679be upstream.

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 08:43:58 -08:00
Tom Herbert
3cc4eadd56 net-rps: Fix brokeness causing OOO packets
[ Upstream commit baefa31db2 ]

In commit c445477d74 which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:34:36 -08:00
Jiri Pirko
c5d6a96676 net: correct check in dev_addr_del()
[ Upstream commit a652208e0b ]

Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 11:34:36 -08:00
ramesh.nagappa@gmail.com
5891cb7c82 net: Fix skb_under_panic oops in neigh_resolve_output
[ Upstream commit e1f165032c ]

The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.

Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28 10:02:12 -07:00
Amerigo Wang
dd81262194 pktgen: fix crash when generating IPv6 packets
commit 5aa8b57200 upstream.

For IPv6, sizeof(struct ipv6hdr) = 40, thus the following
expression will result negative:

        datalen = pkt_dev->cur_pkt_size - 14 -
                  sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
                  pkt_dev->pkt_overhead;

And,  the check "if (datalen < sizeof(struct pktgen_hdr))" will be
passed as "datalen" is promoted to unsigned, therefore will cause
a crash later.

This is a quick fix by checking if "datalen" is negative. The following
patch will increase the default value of 'min_pkt_size' for IPv6.

This bug should exist for a long time, so Cc -stable too.

Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-21 09:17:11 -07:00
Ed Cashin
70875a0484 net: do not disable sg for packets requiring no checksum
[ Upstream commit c0d680e577 ]

A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE).  Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.

  commit f01a5236bd
  Author: Jesse Gross <jesse@nicira.com>
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming.  Calling it for a
protocol that requires no checksum is inappropriate.

The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets.  Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:

  http://www.spinics.net/lists/linux-mm/msg15184.html

The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:

  https://lkml.org/lkml/2012/8/28/140

The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13 05:28:08 +09:00
Eric Dumazet
1a6b2c9da0 net: guard tcp_set_keepalive() to tcp sockets
[ Upstream commit 3e10986d1d ]

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13 05:28:07 +09:00
Chema Gonzalez
74665a9b4f net: small bug on rxhash calculation
[ Upstream commit 6862234238 ]

In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.

For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.

This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)

The patch is co-written with Eric Dumazet <edumazet@google.com>

Signed-off-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13 05:28:07 +09:00
Eric Dumazet
81cee4e9e6 drop_monitor: dont sleep in atomic context
commit bec4596b4e upstream.

drop_monitor calls several sleeping functions while in atomic context.

 BUG: sleeping function called from invalid context at mm/slub.c:943
 in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
 Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
 Call Trace:
  [<ffffffff810697ca>] __might_sleep+0xca/0xf0
  [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
  [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
  [<ffffffff815343fb>] __alloc_skb+0x4b/0x230
  [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
  [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
  [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
  [<ffffffff810568e0>] process_one_work+0x130/0x4c0
  [<ffffffff81058249>] worker_thread+0x159/0x360
  [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
  [<ffffffff8105d403>] kthread+0x93/0xa0
  [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
  [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
  [<ffffffff816be6d0>] ? gs_change+0xb/0xb

Rework the logic to call the sleeping functions in right context.

Use standard timer/workqueue api to let system chose any cpu to perform
the allocation and netlink send.

Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
use mod_timer() to wait 1/10 second before next try.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:42 -07:00
Neil Horman
2c51de7f14 drop_monitor: prevent init path from scheduling on the wrong cpu
commit 4fdcfa1284 upstream.

I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error.  drop monitor maintains a per cpu structure,
that gets initalized from a single cpu.  Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself .  Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu.  If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.

This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from.  In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu

Tested successfully by myself.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:42 -07:00
Neil Horman
b2f89a7caf drop_monitor: Make updating data->skb smp safe
commit 3885ca785a upstream.

Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections.  Specifically, its possible to replace data->skb while its
being written.  This patch corrects that by making data->skb an rcu protected
variable.  That will prevent it from being overwritten while a tracepoint is
modifying it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:42 -07:00
Neil Horman
34c0bc428c drop_monitor: fix sleeping in invalid context warning
commit cde2e9a651 upstream.

Eric Dumazet pointed out this warning in the drop_monitor protocol to me:

[   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[   38.352582] Call Trace:
[   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
[   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b

It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:42 -07:00
Rustad, Mark D
b64295e8b4 net: Statically initialize init_net.dev_base_head
commit 734b65417b upstream.

This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.

With thanks to Eric Dumazet for catching a bug.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:41 -07:00
Alexey Khoroshilov
e869e6223d net/core: Fix potential memory leak in dev_set_alias()
[ Upstream commit 7364e445f6 ]

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:05 -07:00
Ben Hutchings
09c403dc7c tcp: Apply device TSO segment limit earlier
[ Upstream commit 1485348d24 ]

Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:04 -07:00
Ben Hutchings
7f8742aecd net: Allow driver to limit number of GSO segments per skb
[ Upstream commit 30b678d844 ]

A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps).  Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once.  This
results in a single skb that expands to 861 segments.

In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring.  The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset).  This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.

Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
   limit.
2. In netif_skb_features(), if the number of segments is too high then
   mask out GSO features to force fall back to software GSO.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:04 -07:00
Theodore Ts'o
118f98c7de net: feed /dev/random with the MAC address when registering a device
commit 7bf2357524 upstream.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15 12:04:13 -07:00
Jiri Benc
c94eb3f964 net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
[ Upstream commit b1beb681cb ]

When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.

This can be easily trigerred by doing:

tcpdump -i eth0 &
ip l s up eth0

ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09 08:27:52 -07:00
Eric Dumazet
c5a07578be netpoll: fix netpoll_send_udp() bugs
[ Upstream commit 954fba0274 ]

Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :

"skb->len += len;" instead of "skb_put(skb, len);"

Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
only packet headers would be copied, leaving garbage in the payload.

However the skb_realloc_headroom() must be avoided as much as possible
since it requires memory and netpoll tries hard to work even if memory
is exhausted (using a pool of preallocated skbs)

It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
which happens to work for typicall drivers but not all.

Right thing is to use LL_RESERVED_SPACE(dev)
(And also add dev->needed_tailroom of tailroom)

This patch combines both fixes.

Many thanks to Bogdan for raising this issue.

Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:38 -07:00
Michał Mirosław
d3a673fb54 ethtool: allow ETHTOOL_GSSET_INFO for users
[ Upstream commit f80400a26a ]

Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users.
ETHTOOL_GSTRINGS is already allowed, but is unusable without this one.

Signed-off-by: Micha©© Miros©©aw <mirq-linux@rere.qmqm.pl>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:37 -07:00
Jason Wang
325b4161ba net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
[ Upstream commit cc9b17ad29 ]

We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:36 -07:00
David S. Miller
83bba79790 Revert "net: maintain namespace isolation between vlan and real device"
[ Upstream commit 59b9997bab ]

This reverts commit 8a83a00b07.

It causes regressions for S390 devices, because it does an
unconditional DST drop on SKBs for vlans and the QETH device
needs the neighbour entry hung off the DST for certain things
on transmit.

Arnd can't remember exactly why he even needed this change.

Conflicts:

	drivers/net/macvlan.c
	net/8021q/vlan_dev.c
	net/core/dev.c

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric Dumazet
a2abc1310f pktgen: fix module unload for good
[ Upstream commit d4b1133558 ]

commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.

1) list_splice() arguments were in the wrong order

2) list_splice(list, head) has undefined behavior if head is not
initialized.

3) We should use the list_splice_init() variant to clear pktgen_threads
list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric Dumazet
45d21c1368 pktgen: fix crash at module unload
[ Upstream commit c57b546840 ]

commit 7d3d43dab4 (net: In unregister_netdevice_notifier unregister
the netdevices.) makes pktgen crashing at module unload.

[  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
[  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
[  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
[  296.821079] Call Trace:
[  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
[  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
[  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
[  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
[  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
[  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
[  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
[  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
[  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
[  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b

Hold the pktgen_thread_lock while splicing pktgen_threads, and test
pktgen_exiting in pktgen_device_event() to make unload faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric W. Biederman
4e133084e9 net: In unregister_netdevice_notifier unregister the netdevices.
[ Upstream commit 7d3d43dab4 ]

We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.

This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-21 09:39:59 -07:00
Julian Anastasov
0958c122f4 netns: do not leak net_generic data on failed init
[ Upstream commit b922934d01 ]

ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-27 09:51:21 -07:00
Eric Dumazet
6d7946bd33 net: fix a race in sock_queue_err_skb()
[ Upstream commit 110c43304d ]

As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-27 09:51:19 -07:00
Eric Dumazet
99b8230dac net: fix napi_reuse_skb() skb reserve
[ Upstream commit 2a2a459eee ]

napi->skb is allocated in napi_get_frags() using
netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
NET_IP_ALIGN bytes.

However, when such skb is recycled in napi_reuse_skb(), it ends with a
reserve of NET_IP_ALIGN which is suboptimal.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02 09:27:21 -07:00
Michel Machado
035e3f6e8d neighbour: Fixed race condition at tbl->nht
[ Upstream commit 84338a6c9d ]

When the fixed race condition happens:

1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.

2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.

3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.

4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.

Signed-off-by: Michel Machado <michel@digirati.com.br>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19 08:57:45 -07:00
Eric Dumazet
9f8a28dca6 netpoll: netpoll_poll_dev() should access dev->flags
[ Upstream commit 58e05f357a ]

commit 5a698af53f (bond: service netpoll arp queue on master device)
tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29 16:34:06 -08:00
Eric Dumazet
32fa5d8323 gro: more generic L2 header check
[ Upstream commit 5ca3b72c5d ]

Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29 16:33:46 -08:00
Eric Dumazet
8a533666d1 net: fix NULL dereferences in check_peer_redir()
[ Upstream commit d3aaeb38c4, along
  with dependent backports of commits:
     69cce1d140
     9de79c127c
     218fa90f07
     580da35a31
     f7e57044ee
     e049f28883 ]

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13 11:06:13 -08:00
Eric Dumazet
561331eae0 netns: fix net_alloc_generic()
[ Upstream commit 073862ba5d ]

When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :

[  200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
[  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
[  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
[  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
[  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
[  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
[  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
[  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
[  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
[  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
[  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
[  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
[  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190

net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.

This also reduces number of reallocations in net_assign_generic()

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:03 -08:00
dpward
3fa57c1bf5 net: Handle different key sizes between address families in flow cache
commit aa1c366e4f upstream.

With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:17 -08:00
Thomas Gleixner
5796ee3058 net: Unlock sock before calling sk_free()
[ Upstream commit b0691c8ee7 ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:50 -08:00
Richard Cochran
babba877da net: hold sock reference while processing tx timestamps
commit da92b194cc upstream.

The pair of functions,

 * skb_clone_tx_timestamp()
 * skb_complete_tx_timestamp()

were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.

As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.

These functions first appeared in v2.6.36.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:52 -08:00
Eric W. Biederman
32779fa065 rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
commit d2237d3574 upstream.

Renato Westphal noticed that since commit a2835763e1
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.

Fix this by adding the missing manual notification in dev_change_net_namespaces.

Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.

Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:50 -08:00
Tim Chen
265d5c2eb2 scm: Capture the full credentials of the scm sender
[ Upstream commit e33f7a9f37 ]

This patch corrects an erroneous update of credential's gid with uid
introduced in commit 257b5358b3 since 2.6.36.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:54 -07:00
Gao feng
cbab190c50 fib:fix BUG_ON in fib_nl_newrule when add new fib rule
[ Upstream commit 561dac2d41 ]

add new fib rule can cause BUG_ON happen
the reproduce shell is
ip rule add pref 38
ip rule add pref 38
ip rule add to 192.168.3.0/24 goto 38
ip rule del pref 38
ip rule add to 192.168.3.0/24 goto 38
ip rule add pref 38

then the BUG_ON will happen
del BUG_ON and use (ctarget == NULL) identify whether this rule is unresolved

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:51 -07:00
Eric Dumazet
8e24aecbcd arp: fix rcu lockdep splat in arp_process()
[ Upstream commit 20e6074eb8 ]

Dave Jones reported a lockdep splat triggered by an arp_process() call
from parp_redo().

Commit faa9dcf793 (arp: RCU changes) is the origin of the bug, since
it assumed arp_process() was called under rcu_read_lock(), which is not
true in this particular path.

Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in
neigh_proxy_process() to take care of IPv6 side too.

 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 include/linux/inetdevice.h:209 invoked rcu_dereference_check() without
protection!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 4 locks held by setfiles/2123:
  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff8114cbc4>]
walk_component+0x1ef/0x3e8
  #1:  (&isec->lock){+.+.+.}, at: [<ffffffff81204bca>]
inode_doinit_with_dentry+0x3f/0x41f
  #2:  (&tbl->proxy_timer){+.-...}, at: [<ffffffff8106a803>]
run_timer_softirq+0x157/0x372
  #3:  (class){+.-...}, at: [<ffffffff8141f256>] neigh_proxy_process
+0x36/0x103

 stack backtrace:
 Pid: 2123, comm: setfiles Tainted: G        W
3.1.0-0.rc2.git7.2.fc16.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffff8108ca23>] lockdep_rcu_dereference+0xa7/0xaf
  [<ffffffff8146a0b7>] __in_dev_get_rcu+0x55/0x5d
  [<ffffffff8146a751>] arp_process+0x25/0x4d7
  [<ffffffff8146ac11>] parp_redo+0xe/0x10
  [<ffffffff8141f2ba>] neigh_proxy_process+0x9a/0x103
  [<ffffffff8106a8c4>] run_timer_softirq+0x218/0x372
  [<ffffffff8106a803>] ? run_timer_softirq+0x157/0x372
  [<ffffffff8141f220>] ? neigh_stat_seq_open+0x41/0x41
  [<ffffffff8108f2f0>] ? mark_held_locks+0x6d/0x95
  [<ffffffff81062bb6>] __do_softirq+0x112/0x25a
  [<ffffffff8150d27c>] call_softirq+0x1c/0x30
  [<ffffffff81010bf5>] do_softirq+0x4b/0xa2
  [<ffffffff81062f65>] irq_exit+0x5d/0xcf
  [<ffffffff8150dc11>] smp_apic_timer_interrupt+0x7c/0x8a
  [<ffffffff8150baf3>] apic_timer_interrupt+0x73/0x80
  <EOI>  [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
  [<ffffffff814fc285>] ? __slab_free+0x30/0x24c
  [<ffffffff814fc283>] ? __slab_free+0x2e/0x24c
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81130cb0>] kfree+0x108/0x131
  [<ffffffff81204e74>] inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204fc6>] selinux_d_instantiate+0x1c/0x1e
  [<ffffffff81200f4f>] security_d_instantiate+0x21/0x23
  [<ffffffff81154625>] d_instantiate+0x5c/0x61
  [<ffffffff811563ca>] d_splice_alias+0xbc/0xd2
  [<ffffffff811b17ff>] ext4_lookup+0xba/0xeb
  [<ffffffff8114bf1e>] d_alloc_and_lookup+0x45/0x6b
  [<ffffffff8114cbea>] walk_component+0x215/0x3e8
  [<ffffffff8114cdf8>] lookup_last+0x3b/0x3d
  [<ffffffff8114daf3>] path_lookupat+0x82/0x2af
  [<ffffffff8110fc53>] ? might_fault+0xa5/0xac
  [<ffffffff8110fc0a>] ? might_fault+0x5c/0xac
  [<ffffffff8114c564>] ? getname_flags+0x31/0x1ca
  [<ffffffff8114dd48>] do_path_lookup+0x28/0x97
  [<ffffffff8114df2c>] user_path_at+0x59/0x96
  [<ffffffff811467ad>] ? cp_new_stat+0xf7/0x10d
  [<ffffffff811469a6>] vfs_fstatat+0x44/0x6e
  [<ffffffff811469ee>] vfs_lstat+0x1e/0x20
  [<ffffffff81146b3d>] sys_newlstat+0x1a/0x33
  [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
  [<ffffffff812535fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff8150af82>] system_call_fastpath+0x16/0x1b

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:50 -07:00
stephen hemminger
c8656c500d net: allow netif_carrier to be called safely from IRQ
[ Upstream commit 1821f7cd65 ]

As reported by Ben Greer and Froncois Romieu. The code path in
the netif_carrier code leads it to try and disable
a late workqueue to reenable it immediately
netif_carrier_on
-> linkwatch_fire_event
   -> linkwatch_schedule_work
      -> cancel_delayed_work
         -> del_timer_sync

If __cancel_delayed_work is used instead then there is no
problem of waiting for running linkwatch_event.

There is a race between linkwatch_event running re-scheduling
but it is harmless to schedule an extra scan of the linkwatch queue.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:39 -07:00
Neil Horman
60f17a7798 net: add IFF_SKB_TX_SHARED flag to priv_flags
[ Upstream commit d887331506 ]

Pktgen attempts to transmit shared skbs to net devices, which can't be used by
some drivers as they keep state information in skbs.  This patch adds a flag
marking drivers as being able to handle shared skbs in their tx path.  Drivers
are defaulted to being unable to do so, but calling ether_setup enables this
flag, as 90% of the drivers calling ether_setup touch real hardware and can
handle shared skbs.  A subsequent patch will audit drivers to ensure that the
flag is set properly

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Robert Olsson <robert.olsson@its.uu.se>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:38 -07:00
David S. Miller
e997d47bff net: Compute protocol sequence numbers and fragment IDs using MD5.
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:35 -07:00
Ben Hutchings
3de8ae6c0d ethtool: Allow zero-length register dumps again
commit 67ae7cf1ee upstream.

Some drivers (ab)use the ethtool_ops::get_regs operation to expose
only a hardware revision ID.  Commit
a77f5db361 ('ethtool: Allocate register
dump buffer with vmalloc()') had the side-effect of breaking these, as
vmalloc() returns a null pointer for size=0 whereas kmalloc() did not.

For backward-compatibility, allow zero-length dumps again.

Reported-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:34 -07:00
David S. Miller
957c665f37 ipv6: Don't put artificial limit on routing table size.
IPV6, unlike IPV4, doesn't have a routing cache.

Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table.  And
all of these things are together collected in the destination
cache table for ipv6.

This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).

Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.

Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set.  Use this flag bit in ipv6 when adding routing table
entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-01 17:30:43 -07:00
Linus Torvalds
8dac6bee32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d65d
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()
2011-06-16 10:21:59 -07:00
Al Viro
a685e08987 Delay struct net freeing while there's a sysfs instance refering to it
* new refcount in struct net, controlling actual freeing of the memory
	* new method in kobj_ns_type_operations (->drop_ns())
	* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped.  Method renamed to ->grab_current_ns().
	* old net_free() callers call net_drop_ns() instead.
	* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL.  That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.

	Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called.  The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:41 -04:00