Commit graph

30,660 commits

Author SHA1 Message Date
David S. Miller
4fbef95af4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/usb/qmi_wwan.c
	drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
	include/net/netfilter/nf_conntrack_synproxy.h
	include/net/secure_seq.h

The conflicts are of two varieties:

1) Conflicts with Joe Perches's 'extern' removal from header file
   function declarations.  Usually it's an argument signature change
   or a function being added/removed.  The resolutions are trivial.

2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
   a new value, another changes an existing value.  That sort of
   thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 17:06:14 -04:00
Linus Torvalds
c31eeaced2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:

 1) Multiply in netfilter IPVS can overflow when calculating destination
    weight.  From Simon Kirby.

 2) Use after free fixes in IPVS from Julian Anastasov.

 3) SFC driver bug fixes from Daniel Pieczko.

 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

 5) Locking and encapsulation fixes to serial line CAN driver, from
    Andrew Naujoks.

 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
    Eilon Greenstein, and Ariel Elior.

 7) In lapb, if no other packets are outstanding, T1 timeouts actually
    stall things and no packet gets sent.  Fix from Josselin Costanzi.

 8) ICMP redirects should not make it to the socket error queues, from
    Duan Jiong.

 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
    Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
    Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
    support code, from Ming Lei.

13) Fix races in netdevice teardown wrt.  network namespace closing.
    From Eric W.  Biederman.

14) Fix potential missed initialization of net_secret if not TCP
    connections are openned.  From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
    Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
    don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
    Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
    from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
    in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues.  From
    Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
    Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
    reference to the ipv4/ipv6 specific network device state, so use the
    reference put that will check and release the object if the
    reference hits zero.  From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
    instead of __skb_push() so that similar bugs are less hard to find.
    From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
    Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  pkt_sched: fq: rate limiting improvements
  ip6tnl: allow to use rtnl ops on fb tunnel
  sit: allow to use rtnl ops on fb tunnel
  ip_tunnel: Remove double unregister of the fallback device
  ip_tunnel_core: Change __skb_push back to skb_push
  ip_tunnel: Add fallback tunnels to the hash lists
  ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
  qlcnic: Fix SR-IOV configuration
  ll_temac: Reset dma descriptors indexes on ndo_open
  skbuff: size of hole is wrong in a comment
  ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
  ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
  ethernet: moxa: fix incorrect placement of __initdata tag
  ipv6: gre: correct calculation of max_headroom
  powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
  Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
  bonding: Fix broken promiscuity reference counting issue
  tcp: TSQ can use a dynamic limit
  dm9601: fix IFF_ALLMULTI handling
  pkt_sched: fq: qdisc dismantle fixes
  ...
2013-10-01 12:58:48 -07:00
Eric Dumazet
0eab5eb7a3 pkt_sched: fq: rate limiting improvements
FQ rate limiting suffers from two problems, reported
by Steinar :

1) FQ enforces a delay when flow quantum is exhausted in order
to reduce cpu overhead. But if packets are small, current
delay computation is slightly wrong, and observed rates can
be too high.

Steinar had this problem because he disabled TSO and GSO,
and default FQ quantum is 2*1514.

(Of course, I wish recent TSO auto sizing changes will help
to not having to disable TSO in the first place)

2) maxrate was not used for forwarded flows (skbs not attached
to a socket)

Tested:

tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
netperf -H lpq84 -l 1000 &
sleep 10 ; tc -s qdisc show dev eth0
qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
 quantum 3028 initial_quantum 15140 maxrate 8000Kbit
 Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
 rate 7831Kbit 653pps backlog 7570b 5p requeues 0
  44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
  0 gc, 0 highprio, 5545 throttled

lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 13:00:38 -04:00
Nicolas Dichtel
bb8140947a ip6tnl: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by c075b13098 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824 ("ip6tnl: add x-netns support")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:55:53 -04:00
Nicolas Dichtel
205983c437 sit: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by ba3e3f50a0 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf ("sit: add support of x-netns")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:55:53 -04:00
Steffen Klassert
cfe4a53692 ip_tunnel: Remove double unregister of the fallback device
When queueing the netdevices for removal, we queue the
fallback device twice in ip_tunnel_destroy(). The first
time when we queue all netdevices in the namespace and
then again explicitly. Fix this by removing the explicit
queueing of the fallback device.

Bug was introduced when network namespace support was added
with commit 6c742e714d ("ipip: add x-netns support").

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
78a3694d44 ip_tunnel_core: Change __skb_push back to skb_push
Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
moved the IP header installation to iptunnel_xmit() and
changed skb_push() to __skb_push(). This makes possible
bugs hard to track down, so change it back to skb_push().

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
6701328262 ip_tunnel: Add fallback tunnels to the hash lists
Currently we can not update the tunnel parameters of
the fallback tunnels because we don't find them in the
hash lists. Fix this by adding them on initialization.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
3e08f4a72f ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
David S. Miller
e024bdc051 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:

* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
  Patrick McHardy.

* Fix possible weight overflow in lblc and lblcr schedulers due to
  32-bits arithmetics, from Simon Kirby.

* Fix possible memory access race in the lblc and lblcr schedulers,
  introduced when it was converted to use RCU, two patches from
  Julian Anastasov.

* Fix hard dependency on CPU 0 when reading per-cpu stats in the
  rate estimator, from Julian Anastasov.

* Fix race that may lead to object use after release, when invoking
  ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
  Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:39:35 -04:00
Johannes Berg
131a19bc92 regulatory: enable channels 52-64 and 100-144 for world roaming
If allowed in a country, these channels typically require DFS
so mark them as such. Channel 144 is a bit special, it's coming
into use now to allow more VHT 80 channels, but world roaming
with passive scanning is acceptable anyway. It seems fairly
unlikely that it'll be used as the control channel for a VHT
AP, but it needs to be present to allow a full VHT connection
to an AP that uses it as one of the secondary channels.

Also enable VHT 160 on these channels, and also for channels
36-48 to be able to use VHT 160 there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 14:14:02 +02:00
Pablo Neira Ayuso
91cb498e6a netfilter: cttimeout: allow to set/get default protocol timeouts
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:

/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE

This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.

This should allow us to get rid of the existing proc/sysctl code
in the midterm.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01 13:17:39 +02:00
Simon Wunderlich
ff311bc11a nl80211: allow CAC only if no operation is going on
A CAC should fail if it is triggered while the interface is already
running.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 13:05:59 +02:00
holger@eitzenberger.org
180cf72f56 netfilter: nf_ct_sip: consolidate NAT hook functions
There are currently seven different NAT hooks used in both
nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in
nf_conntrack_sip, then set from the nf_nat_sip NAT helper.

And because each of them is exported there is quite some overhead
introduced due of this.

By introducing nf_nat_sip_hooks I am able to reduce both text/data
somewhat.  For nf_conntrack_sip e. g. I get

        text             data              bss              dec
old    15243             5256               32            20531
new    15010             5192               32            20234

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01 12:47:09 +02:00
Gao feng
afff14f608 netfilter: nfnetlink_log: use proper net to allocate skb
Use proper net struct to allocate skb, otherwise
netlink mmap will be of no effect.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01 12:46:56 +02:00
Fred Zhou
1f4ffde845 mac80211: improve default WMM parameter setting
Move the default setting for WMM parameters outside the for loop
to avoid redundant assignment multiple times.

Signed-off-by: Fred Zhou <fred.zy@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:24:29 +02:00
Michal Kazior
0cfcefef19 mac80211: support reporting A-MSDU subframes individually
Some devices may not be able to report A-MSDUs in
single buffers. Drivers for such devices were
forced to re-assemble A-MSDUs which would then
be eventually disassembled by mac80211. This could
lead to CPU cache thrashing and poor performance.

Since A-MSDU has a single sequence number all
subframes share it. This was in conflict with
retransmission/duplication recovery
(IEEE802.11-2012: 9.3.2.10).

Patch introduces a new flag that is meant to be
set for all individually reported A-MSDU subframes
except the last one. This ensures the
last_seq_ctrl is updated after the last subframe
is processed. If an A-MSDU is actually a duplicate
transmission all reported subframes will be
properly discarded.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[johannes: add braces that were missing even before]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:22:03 +02:00
Fred Zhou
15e230abaa mac80211: use exact-size allocation for authentication frame
The authentication frame has a fixied size of 30 bytes
(including header, algo num, trans seq num, and status)
followed by a variable challenge text.
Allocate using exact size, instead of over-allocation
by sizeof(ieee80211_mgmt).

Signed-off-by: Fred Zhou <fred.zy@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:20:38 +02:00
Gao feng
7433268783 netfilter: nfnetlink_queue: use proper net namespace to allocate skb
Use proper net struct to allocate skb, otherwise netlink mmap
will have no effect.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01 12:20:31 +02:00
Janusz Dziedzic
f0823475d5 cfg80211: parse dfs region for internal regdb option
Add support for parsing and setting the dfs region (ETSI, FCC, JP)
when the internal regulatory database is used. Before this
the DFS region was being ignored even if present on the used
db.txt

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:18:36 +02:00
Johannes Berg
55fff50113 mac80211: add explicit IBSS driver operations
This can be useful for drivers if they have any failure cases
when joining an IBSS. Also move setting the queue parameters
to before this new call, in case the new driver op needs them
already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:17:45 +02:00
Eliad Peller
5eb7906b47 ieee80211: fix vht cap definitions
VHT_CAP_BEAMFORMER_ANTENNAS cap is actually defined in the draft as
VHT_CAP_BEAMFORMEE_STS_MAX, and its size is 3 bits long.

VHT_CAP_SOUNDING_DIMENSIONS is also 3 bits long.

Fix the definitions and change the cap masking accordingly.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:17:08 +02:00
Eliad Peller
f364ef99a8 mac80211: fix some snprintf misuses
In some debugfs related functions snprintf was used
while scnprintf should have been used instead.

(blindly adding the return value of snprintf and supplying
it to the next snprintf might result in buffer overflow when
the input is too big)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:16:51 +02:00
Fan Du
f59bbdfa5c xfrm: Simplify SA looking up when using wildcard source
__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match when looking SA
using wildcard source address. Passing saddr from flow is worst option,
as the checking needs to reach the fourth step while actually only
one time checking will do the work.

So, simplify this process by only checking destination address when
using wildcard source address for looking up SAs.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-01 10:09:33 +02:00
Fan Du
6f1156383a xfrm: Force SA to be lookup again if SA in acquire state
If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.

So, here bail out, and try again.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-01 10:09:33 +02:00
Salam Noureddine
9260d3e101 ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:28:58 -07:00
Salam Noureddine
e2401654dd ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:28:56 -07:00
Hannes Frederic Sowa
3da812d860 ipv6: gre: correct calculation of max_headroom
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the

	if (encap_limit >= 0) {
		max_headroom += 8;
		mtu -= 8;
	}

increments an uninitialized variable before max_headroom was reset.

Found with coverity: 728539

Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:04:09 -07:00
Eric W. Biederman
0bbf87d852 net ipv4: Convert ipv4.ip_local_port_range to be per netns v3
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
  of the callers.
- Move the initialization of sysctl_local_ports into
   sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
  This patch now successfully passes an allmodconfig build.
  Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:59:38 -07:00
stephen hemminger
56d7b53f47 ethernet: use likely() for common Ethernet encap
Mark code path's likely/unlikely based on most common usage.
  * Very few devices use dsa tags.
  * Most traffic is Ethernet (not 802.2)
  * No sane person uses trailer type or Novell encapsulation

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:52:53 -07:00
stephen hemminger
12861b7bc2 ethernet: cleanup eth_type_trans
Remove old legacy comment and weird if condition.
The comment has outlived it's stay and is throwback to some
early net code (before my time). Maybe Dave remembers what it meant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:52:52 -07:00
Eric Dumazet
c9eeec26e3 tcp: TSQ can use a dynamic limit
When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 20:41:57 -07:00
Li RongQing
fbadadd90c ipv6: Not need to set fl6.flowi6_flags as zero
setting fl6.flowi6_flags as zero after memset is redundant, Remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 19:14:11 -04:00
John W. Linville
15214c2f6c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-09-30 16:14:27 -04:00
Eric Dumazet
8d34ce10c5 pkt_sched: fq: qdisc dismantle fixes
fq_reset() should drops all packets in queue, including
throttled flows.

This patch moves code from fq_destroy() to fq_reset()
to do the cleaning.

fq_change() must stop calling fq_dequeue() if all remaining
packets are from throttled flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 15:51:23 -04:00
stephen hemminger
6459082a3c qdisc: basic classifier - remove unnecessary initialization
err is set once, then first code resets it.
  err = tcf_exts_validate(...)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 15:47:43 -04:00
stephen hemminger
0c4e4020f0 qdisc: meta return ENOMEM on alloc failure
Rather than returning earlier value (EINVAL), return ENOMEM if
kzalloc fails. Found while reviewing to find another EINVAL condition.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 15:47:43 -04:00
Oliver Smith
7c3ad056ef netfilter: ipset: Add hash:net,port,net module to kernel.
This adds a new set that provides similar functionality to ip,port,net
but permits arbitrary size subnets for both the first and last
parameter.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:42:58 +02:00
Vitaly Lavrov
1785e8f473 netfiler: ipset: Add net namespace for ipset
This patch adds netns support for ipset.

Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".

The remaining corrections related to the change prototypes of public functions ip_set_*.

The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347

Signed-off-by: Vitaly Lavrov <lve@guap.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:42:52 +02:00
Jozsef Kadlecsik
3fd986b3d9 netfilter: ipset: Use a common function at listing the extensions
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:42:36 +02:00
Jozsef Kadlecsik
8ec81f9a4d netfilter: ipset: For set:list types, replaced elements must be zeroed out
The new extensions require zero initialization for the new element
to be added into a slot from where another element was pushed away.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:29 +02:00
Jozsef Kadlecsik
80571a9ea4 netfilter: ipset: Fix hash resizing with comments
The destroy function must take into account that resizing doesn't
create new extensions so those cannot be destroyed at resize.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:29 +02:00
Oliver Smith
fda75c6d9e netfilter: ipset: Support comments in hash-type ipsets.
This provides kernel support for creating ipsets with comment support.

This does incur a penalty to flushing/destroying an ipset since all
entries are walked in order to free the allocated strings, this penalty
is of course less expensive than the operation of listing an ipset to
userspace, so for general-purpose usage the overall impact is expected
to be little to none.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:29 +02:00
Oliver Smith
81b10bb4bd netfilter: ipset: Support comments in the list-type ipset.
This provides kernel support for creating list ipsets with the comment
annotation extension.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:29 +02:00
Oliver Smith
b90cb8ba19 netfilter: ipset: Support comments in bitmap-type ipsets.
This provides kernel support for creating bitmap ipsets with comment
support.

As is the case for hashes, this incurs a penalty when flushing or
destroying the entire ipset as the entries must first be walked in order
to free the comment strings. This penalty is of course far less than the
cost of listing an ipset to userspace. Any set created without support
for comments will be flushed/destroyed as before.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Oliver Smith
68b63f08d2 netfilter: ipset: Support comments for ipset entries in the core.
This adds the core support for having comments on ipset entries.

The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Oliver Smith
ea53ac5b63 netfilter: ipset: Add hash:net,net module to kernel.
This adds a new set that provides the ability to configure pairs of
subnets. A small amount of additional handling code has been added to
the generic hash header file - this code is conditionally activated by a
preprocessor definition.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik
d9628bbeca netfilter: ipset: Kconfig: ipset needs NETFILTER_NETLINK
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik
b91b396d5e netfilter: ipset: list:set: make sure all elements are checked by the gc
When an element timed out, the next one was skipped by the garbage
collector, fixed.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik
40cd63bf33 netfilter: ipset: Support extensions which need a per data destroy function
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:27 +02:00