linux-uconsole

Author	SHA1	Message	Date
Eric Dumazet	4b549a2ef4	fq_codel: Fair Queue Codel AQM Fair Queue Codel packet scheduler Principles : - Packets are classified (internal classifier or external) on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed on same slot) - Each flow has a CoDel managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered (CoDel uses a FIFO) - head drops only. - ECN capability is on by default. - Very low memory footprint (64 bytes per flow) tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ] [ target TIME ] [ interval TIME ] [ noecn ] [ quantum BYTES ] defaults : 1024 flows, 10240 packets limit, quantum : device MTU target : 5ms (CoDel default) interval : 100ms (CoDel default) Impressive results on load : class htb 1:1 root leaf 10: prio 0 quantum 1514 rate 200000Kbit ceil 200000Kbit burst 1475b/8 mpu 0b overhead 0b cburst 1475b/8 mpu 0b overhead 0b level 0 Sent 43304920109 bytes 33063109 pkt (dropped 0, overlimits 0 requeues 0) rate 201691Kbit 28595pps backlog 0b 312p requeues 0 lended: 33063109 borrowed: 0 giants: 0 tokens: -912 ctokens: -912 class fq_codel 10:1735 parent 10: (dropped 1292, overlimits 0 requeues 0) backlog 15140b 10p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.1ms class fq_codel 10:4524 parent 10: (dropped 1291, overlimits 0 requeues 0) backlog 16654b 11p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.1ms class fq_codel 10:4e74 parent 10: (dropped 1290, overlimits 0 requeues 0) backlog 6056b 4p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 6.4ms dropping drop_next 92.0ms class fq_codel 10:628a parent 10: (dropped 1289, overlimits 0 requeues 0) backlog 7570b 5p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 5.4ms dropping drop_next 90.9ms class fq_codel 10:a4b3 parent 10: (dropped 302, overlimits 0 requeues 0) backlog 16654b 11p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.1ms class fq_codel 10:c3c2 parent 10: (dropped 1284, overlimits 0 requeues 0) backlog 13626b 9p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 5.9ms class fq_codel 10:d331 parent 10: (dropped 299, overlimits 0 requeues 0) backlog 15140b 10p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.0ms class fq_codel 10:d526 parent 10: (dropped 12160, overlimits 0 requeues 0) backlog 35870b 211p requeues 0 deficit 1508 count 12160 lastcount 1 ldelay 15.3ms dropping drop_next 247us class fq_codel 10:e2c6 parent 10: (dropped 1288, overlimits 0 requeues 0) backlog 15140b 10p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.1ms class fq_codel 10:eab5 parent 10: (dropped 1285, overlimits 0 requeues 0) backlog 16654b 11p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 5.9ms class fq_codel 10:f220 parent 10: (dropped 1289, overlimits 0 requeues 0) backlog 15140b 10p requeues 0 deficit 1514 count 1 lastcount 1 ldelay 7.1ms qdisc htb 1: root refcnt 6 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 43331086547 bytes 33092812 pkt (dropped 0, overlimits 66063544 requeues 71) rate 201697Kbit 28602pps backlog 0b 260p requeues 71 qdisc fq_codel 10: parent 1:1 limit 10240p flows 65536 target 5.0ms interval 100.0ms ecn Sent 43331086547 bytes 33092812 pkt (dropped 949359, overlimits 0 requeues 0) rate 201697Kbit 28602pps backlog 189352b 260p requeues 0 maxpacket 1514 drop_overlimit 0 new_flow_count 5582 ecn_mark 125593 new_flows_len 0 old_flows_len 11 PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data. 64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=0.227 ms 64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=0.165 ms 64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=0.166 ms 64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=0.151 ms 64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=0.164 ms 64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=0.172 ms 64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=0.175 ms 64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=0.183 ms 64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=0.158 ms 64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=0.200 ms 10 packets transmitted, 10 received, 0% packet loss, time 8999ms rtt min/avg/max/mdev = 0.151/0.176/0.227/0.022 ms Much better than SFQ because of priority given to new flows, and fast path dirtying less cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:53:42 -04:00
Linus Walleij	d5543206b2	usb/net: rndis: move bus message definition This moves the bus message definition to land together with the other message types. This message is not used in the kernel but I'm keeping it anyway. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:15:20 -04:00
Linus Walleij	e20289ed3f	usb/net: rndis: fixup a few name prefixes This switches a horde of NDIS_-prefixed variables to the RNDIS_ prefix. Most of them aren't used much and causes no changes. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:13:39 -04:00
Linus Walleij	514911678f	usb/net: rndis: merge command codes Switch the hyperv filter and rndis gadget driver to use the same command enumerators as the other drivers and delete the surplus command codes. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:11:18 -04:00
Linus Walleij	c80174f3da	usb/net: rndis: move and namespace PnP defines This moves the PnP OID definitions to the RNDIS_* namespace and puts them in the next falling slot in the list. Oh, the comment above the PnP defines was referring to some obsolete or out-of-tree driver so removed it, and removed my own comments telling where each header segment came from as well, we have moved everything around by this point anyway. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:10:18 -04:00
Linus Walleij	b101943209	usb/net: rndis: delete duplicate packet types The NDIS_-prefixed packet types have equivalent RNDIS_- prefixed types, besides nothing in the kernel use these defines. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:08:46 -04:00
Linus Walleij	17c51b6ccc	usb/net: rndis: merge media type definitions Let's have a unified table of RNDIS media. We used to have a similar table with NDIS_* prefix from the gadget driver, but since we're only using RNDIS in the kernel (IIRC NDIS, non-remote, is for the windows- internal network drivers so what do we care) let's prefix everything with RNDIS. Some of the definitions were conflicting, in one of the defines 0x0B is bearer "CO WAN" and in two others "BPC". Well I took the majority vote. Two definition of medium 0x09 calls it "wireless WAN" but one vote for "wireless LAN" but in this case I am sticking with the minority, "Wide Area Network" does not make much sense in this case as far as I can tell. NOTE: latin singular and plural is so screwed up in these defines that it makes my eyes bleed. But I will not attempt to submit a patch converting all use of _MEDIA_ to _MEDIUM_ while I can probably tell from the semantics of the code that RNDIS_MEDIA_STATE_CONNECTED is most probably (erroneously) referring to a singular, unless it can return an array of connected media. I suspect these erroneous plurals are used in documentation and such so I don't want to mess around with things for no functional change. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:08:06 -04:00
Linus Walleij	91d6aef7d1	usb/net: rndis: group all status codes together Move all RNDIS status codes so they appear in rising order and in one place of the header file. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:07:12 -04:00
Linus Walleij	c3ef5eae86	usb/net: rndis: delete surplus defines These defines are not used in the kernel, and they have duplicate definitions under the RNDIS_* prefix. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:06:42 -04:00
Linus Walleij	4cc6c4d584	usb/net: rndis: merge duplicate 802_* OIDs The 802_* network OIDs were duplicated, so let's merge them and use the RNDIS_* prefixed definitions from the hyperV driver. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:05:59 -04:00
Linus Walleij	8cdddc3f9d	usb/net: rndis: eliminate first set of duplicate OIDs The RNDIS protocol contains a vast number of Object ID:s (OIDs). The current definitions had multiple definitions of these ID:s, let's use the nicely RNDIS_*-prefixed defines from the HyperV implementation, rename everywhere they're used, and copy+rename the few that were missing from this list of objects. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:04:19 -04:00
Linus Walleij	007e5c8e6a	usb/net: rndis: remove ambigous status codes The RNDIS status codes are redefined with much stranged ifdeffery and only one of these codes was used in the hyperv driver, and there it is very clearly referring to the RNDIS variant, not some other status. So clarify this by explictly using the RNDIS_* prefixed status code in the hyperv drivera and delete the duplicate defines. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:03:14 -04:00
Linus Walleij	7591157e18	usb/net: rndis: break out <linux/rndis.h> defines As a first step to consolidate the RNDIS implementations, break out a common file with all the #defines and move it to <linux/rndis.h>. This also deletes the immediate duplicated defines in the <linux/rndis.h> file that yields a lot of compilation warnings. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:02:22 -04:00
Linus Walleij	7390e8b0de	usb/net: rndis: inline the cpu_to_le32() macro The header file <linux/usb/rndis_host.h> used a number of #defines that included the cpu_to_le32() macro to assure the result will be in LE endianness. Inlining this into the code instead of using it in the code definitions yields consolidation opportunities later on as you will see in the following patches. The individual drivers also used local defines - all are switched over to the pattern of doing the conversion at the call sites instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-12 15:00:45 -04:00
Eric Dumazet	76e3cc126b	codel: Controlled Delay AQM An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson. http://queue.acm.org/detail.cfm?id=2209336 This AQM main input is no longer queue size in bytes or packets, but the delay packets stay in (FIFO) queue. As we don't have infinite memory, we still can drop packets in enqueue() in case of massive load, but mean of CoDel is to drop packets in dequeue(), using a control law based on two simple parameters : target : target sojourn time (default 5ms) interval : width of moving time window (default 100ms) Based on initial work from Dave Taht. Refactored to help future codel inclusion as a plugin for other linux qdisc (FQ_CODEL, ...), like RED. include/net/codel.h contains codel algorithm as close as possible than Kathleen reference. net/sched/sch_codel.c contains the linux qdisc specific glue. Separate structures permit a memory efficient implementation of fq_codel (to be sent as a separate work) : Each flow has its own struct codel_vars. timestamps are taken at enqueue() time with 1024 ns precision, allowing a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses usec as base unit. Selected packets are dropped, unless ECN is enabled and packets can get ECN mark instead. Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and tg3 drivers (BQL enabled). Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ] [ interval TIME ] [ ecn ] qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0) rate 202365Kbit 16708pps backlog 113550b 75p requeues 0 count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us maxpacket 1514 ecn_mark 84399 drop_overlimit 0 CoDel must be seen as a base module, and should be used keeping in mind there is still a FIFO queue. So a typical setup will probably need a hierarchy of several qdiscs and packet classifiers to be able to meet whatever constraints a user might have. One possible example would be to use fq_codel, which combines Fair Queueing and CoDel, in replacement of sfq / sfq_red. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dave Taht <dave.taht@bufferbloat.net> Cc: Kathleen Nichols <nichols@pollere.com> Cc: Van Jacobson <van@pollere.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:35:02 -04:00
Joe Perches	baf523c9ba	etherdevice.h: Add ether_addr_equal_64bits Add an optimized boolean function to check if 2 ethernet addresses are the same. This is to avoid any confusion about compare_ether_addr_64bits returning an unsigned, and not being able to use the compare_ether_addr_64bits function for sorting ala memcmp. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
David S. Miller	ae535ba448	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next	2012-05-09 22:51:17 -04:00
Stuart Hodgson	41c3cb6d20	ethtool: Extend the ethtool API to obtain plugin module eeprom data ETHTOOL_GMODULEINFO returns a new struct ethtool_modinfo that will return the type and size of plug-in module eeprom (such as SFP+) for parsing by userland program. ETHTOOL_GMODULEEEPROM returns the raw eeprom information using the existing ethtool_eeprom structture to return the data Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2012-05-10 02:22:17 +01:00
Joe Perches	a599b0f54d	etherdevice.h: Add ether_addr_equal Add a boolean function to check if 2 ethernet addresses are the same. This is to avoid any confusion about compare_ether_addr returning an unsigned, and not being able to use the compare_ether_addr function for sorting ala memcmp. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-09 20:49:16 -04:00
Florian Westphal	0197dee7d3	netfilter: hashlimit: byte-based limit mode can be used e.g. for ingress traffic policing or to detect when a host/port consumes more bandwidth than expected. This is done by optionally making cost to mean "cost per 16-byte-chunk-of-data" instead of "cost per packet". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-09 13:04:57 +02:00
Hans Schillstrom	cf308a1fae	netfilter: add xt_hmark target for hash-based skb marking The target allows you to create rules in the "raw" and "mangle" tables which set the skbuff mark by means of hash calculation within a given range. The nfmark can influence the routing method (see "Use netfilter MARK value as routing key") and can also be used by other subsystems to change their behaviour. [ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-09 12:54:05 +02:00
Hans Schillstrom	84018f55ab	netfilter: ip6_tables: add flags parameter to ipv6_find_hdr() This patch adds the flags parameter to ipv6_find_hdr. This flags allows us to: * know if this is a fragment. * stop at the AH header, so the information contained in that header can be used for some specific packet handling. This patch also adds the offset parameter for inspection of one inner IPv6 header that is contained in error messages. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-09 12:53:47 +02:00
David S. Miller	9bb862beb6	Merge branch 'master' of git://1984.lsi.us.es/net-next	2012-05-08 14:40:21 -04:00
Pablo Neira Ayuso	d16cf20e2f	netfilter: remove ip_queue support This patch removes ip_queue support which was marked as obsolete years ago. The nfnetlink_queue modules provides more advanced user-space packet queueing mechanism. This patch also removes capability code included in SELinux that refers to ip_queue. Otherwise, we break compilation. Several warning has been sent regarding this to the mailing list in the past month without anyone rising the hand to stop this with some strong argument. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-08 20:25:42 +02:00
Pablo Neira Ayuso	6714cf5465	netfilter: nf_conntrack: fix explicit helper attachment and NAT Explicit helper attachment via the CT target is broken with NAT if non-standard ports are used. This problem was hidden behind the automatic helper assignment routine. Thus, it becomes more noticeable now that we can disable the automatic helper assignment with Eric Leblond's: 9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment Basically, nf_conntrack_alter_reply asks for looking up the helper up if NAT is enabled. Unfortunately, we don't have the conntrack template at that point anymore. Since we don't want to rely on the automatic helper assignment, we can skip the second look-up and stick to the helper that was attached by iptables. With the CT target, the user is in full control of helper attachment, thus, the policy is to trust what the user explicitly configures via iptables (no automatic magic anymore). Interestingly, this bug was hidden by the automatic helper look-up code. But it can be easily trigger if you attach the helper in a non-standard port, eg. iptables -I PREROUTING -t raw -p tcp --dport 8888 \ -j CT --helper ftp And you disabled the automatic helper assignment. I added the IPS_HELPER_BIT that allows us to differenciate between a helper that has been explicitly attached and those that have been automatically assigned. I didn't come up with a better solution (having backward compatibility in mind). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-08 19:44:42 +02:00
Julian Anastasov	cdcc5e905d	ipvs: always update some of the flags bits in backup As the goal is to mirror the inactconns/activeconns counters in the backup server, make sure the cp->flags are updated even if cp is still not bound to dest. If cp->flags are not updated ip_vs_bind_dest will rely only on the initial flags when updating the counters. To avoid mistakes and complicated checks for protocol state rely only on the IP_VS_CONN_F_INACTIVE bit when updating the counters. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-05-08 19:38:31 +02:00
Joe Perches	b44907e64c	etherdev.h: Convert int is_<foo>_ether_addr to bool Make the return value explicitly true or false. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-08 13:06:16 -04:00
David S. Miller	0d6c4a2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 23:35:40 -04:00
David Daney	0ca2997d14	netdev/of/phy: Add MDIO bus multiplexer support. This patch adds a somewhat generic framework for MDIO bus multiplexers. It is modeled on the I2C multiplexer. The multiplexer is needed if there are multiple PHYs with the same address connected to the same MDIO bus adepter, or if there is insufficient electrical drive capability for all the connected PHY devices. Conceptually it could look something like this: ------------------ \| Control Signal \| --------+--------- \| --------------- --------+------ \| MDIO MASTER \|---\| Multiplexer \| --------------- --+-------+---- \| \| C C h h i i l l d d \| \| --------- A B --------- \| \| \| \| \| \| \| PHY@1 +-------+ +---+ PHY@1 \| \| \| \| \| \| \| --------- \| \| --------- --------- \| \| --------- \| \| \| \| \| \| \| PHY@2 +-------+ +---+ PHY@2 \| \| \| \| \| --------- --------- This framework configures the bus topology from device tree data. The mechanics of switching the multiplexer is left to device specific drivers. The follow-on patch contains a multiplexer driven by GPIO lines. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 22:58:09 -04:00
David Daney	2510602200	netdev/of/phy: New function: of_mdio_find_bus(). Add of_mdio_find_bus() which allows an mii_bus to be located given its associated the device tree node. This is needed by the follow-on patch to add a driver for MDIO bus multiplexers. The of_mdiobus_register() function is modified so that the device tree node is recorded in the mii_bus. Then we can find it again by iterating over all mdio_bus_class devices. Because the OF device tree has now become an integral part of the kernel, this can live in mdio_bus.c (which contains the needed mdio_bus_class structure) instead of of_mdio.c. Signed-off-by: David Daney <david.daney@cavium.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 22:58:09 -04:00
Johannes Berg	1c430a727f	net: compare_ether_addr[_64bits]() has no ordering Neither compare_ether_addr() nor compare_ether_addr_64bits() (as it can fall back to the former) have comparison semantics like memcmp() where the sign of the return value indicates sort order. We had a bug in the wireless code due to a blind memcmp replacement because of this. A cursory look suggests that the wireless bug was the only one due to this semantic difference. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 19:21:29 -04:00
Alexander Duyck	ec47ea8247	skb: Add inline helper for getting the skb end offset from head With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-06 13:13:19 -04:00
Karsten Keil	f45ebf3a6b	mISDN: Help to identify the card With multiple cards is hard to figure out which port caused trouble int the layer2 routines (e.g. got a timeout). Now we have the informations in the log output. Signed-off-by: Karsten Keil <kkeil@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-04 11:56:19 -04:00
Karsten Keil	c626c12727	mISDN: Make layer1 timer 3 value configurable For certification test it is very useful to change the layer1 timer3 value on runtime. Signed-off-by: Karsten Keil <kkeil@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-04 11:55:05 -04:00
Karsten Keil	8423e6b212	mISDN: L2 timeouts need to be queued as L2 event To be full preemptiv safe, we cannot handle a L2 timeout in the timer context itself, we should do all actions via the D-channel thread. Signed-off-by: Karsten Keil <kkeil@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-04 11:54:27 -04:00
Linus Torvalds	c42f1d4b52	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix from Ingo van Lil. 2) Propagate the negative packet offset fix into the PowerPC BPF JIT. From Jan Seiffert. 3) dl2k driver's private ioctls were letting unprivileged tasks make MII writes and other ugly bits like that. Fix from Jeff Mahoney. 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund. 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and Julian Anastasov. 6) Fix races and sleeping in locked context bugs in drop_monitor, from Neil Horman. 7) Fix link status indication in smsc95xx driver, from Paolo Pisati. 8) Fix bridge netfilter OOPS, from Peter Huang. 9) L2TP sendmsg can return on error conditions with the socket lock held, oops. Fix from Sasha Levin. 10) udp_diag should return meaningful values for socket memory usage, from Shan Wei. 11) Eric Dumazet is so awesome he gets his own section: Socket memory cgroup code (I never should have applied those patches, grumble...) made erroneous changes to sk_sockets_allocated_read_positive(). It was changed to use percpu_counter_sum_positive (which requires BH disabling) instead of percpu_counter_read_positive (which does not). Revert back to avoid crashes and lockdep warnings. Adjust the default tcp_adv_win_scale and tcp_rmem[2] values to fix throughput regressions. This is necessary as a result of our more precise skb->truesize tracking. Fix SKB leak in netem packet scheduler. 12) New device IDs for various bluetooth devices, from Manoj Iyer, AceLan Kao, and Steven Harms. 13) Fix command completion race in ipw2200, from Stanislav Yakovlev. 14) Fix rtlwifi oops on unload, from Larry Finger. 15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver. From Stephane Fillod. 16) ehea driver registers it's IRQ before all the necessary state is setup, resulting in crashes. Fix from Thadeu Lima de Souza Cascardo. 17) Fix PHY connection failures in davinci_emac driver, from Anatolij Gustschin. 18) Missing break; in switch statement in bluetooth's hci_cmd_complete_evt(). Fix from Szymon Janc. 19) Fix queue programming in iwlwifi, from Johannes Berg. 20) Interrupt throttling defaults not being actually programmed into the hardware, fix from Jeff Kirsher and Ying Cai. 21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from Benjamin Poirier. 22) Fix blind status block RX producer pointer deref in TG3 driver, from Matt Carlson. 23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de Souza Cascardo. 24) Fix crashes in 6lowpan, from Alexander Smirnov. 25) tcp_complete_cwr() needs to be careful to not rewind the CWND to ssthresh if ssthresh has the "infinite" value. Fix from Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) sungem: Fix WakeOnLan tcp: change tcp_adv_win_scale and tcp_rmem[2] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg drop_monitor: prevent init path from scheduling on the wrong cpu usbnet: fix failure handling in usbnet_probe usbnet: fix leak of transfer buffer of dev->interrupt ucc_geth: Add 16 bytes to max TX frame for VLANs net: ucc_geth, increase no. of HW RX descriptors netem: fix possible skb leak sky2: fix receive length error in mixed non-VLAN/VLAN traffic sky2: propogate rx hash when packet is copied net: fix two typos in skbuff.h cxgb3: Don't call cxgb_vlan_mode until q locks are initialized ixgbe: fix calling skb_put on nonlinear skb assertion bug ixgbe: Fix a memory leak in IEEE DCB igbvf: fix the bug when initializing the igbvf smsc75xx: enable mac to detect speed/duplex from phy smsc75xx: declare smsc75xx's MII as GMII capable smsc75xx: fix phy interrupt acknowledge smsc75xx: fix phy init reset loop ...	2012-05-03 17:10:39 -07:00
Alexander Duyck	3a7c1ee4ab	skb: Add skb_head_is_locked helper function This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-03 13:18:37 -04:00
Yuchung Cheng	750ea2bafa	tcp: early retransmit: delayed fast retransmit Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2). Delays the fast retransmit by an interval of RTT/4. We borrow the RTO timer to implement the delay. If we receive another ACK or send a new packet, the timer is cancelled and restored to original RTO value offset by time elapsed. When the delayed-ER timer fires, we enter fast recovery and perform fast retransmit. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 20:56:10 -04:00
Yuchung Cheng	eed530b6c6	tcp: early retransmit This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 20:56:10 -04:00
Eric Dumazet	d961949660	net: fix two typos in skbuff.h fix kernel doc typos in function names Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:40:19 -04:00
Eric Dumazet	e4ae004b84	netem: add ECN capability Add ECN (Explicit Congestion Notification) marking capability to netem tc qdisc add dev eth0 root netem drop 0.5 ecn Instead of dropping packets, try to ECN mark them. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:39:48 -04:00
Eric Dumazet	18d0700024	net: skb_peek()/skb_peek_tail() cleanups remove useless casts and rename variables for less confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:39:48 -04:00
Chris Elston	a32e0eec70	l2tp: introduce L2TPv3 IP encapsulation support for IPv6 L2TPv3 defines an IP encapsulation packet format where data is carried directly over IP (no UDP). The kernel already has support for L2TP IP encapsulation over IPv4 (l2tp_ip). This patch introduces support for L2TP IP encapsulation over IPv6. The implementation is derived from ipv6/raw and ipv4/l2tp_ip. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:30:55 -04:00
Chris Elston	f9bac8df90	l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using the netlink API. We already support unmanaged L2TPv3 tunnels over IPv4. A patch to iproute2 to make use of this feature will be submitted separately. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:30:55 -04:00
James Chapman	9d4ec1aeda	pppox: Replace __attribute__((packed)) in if_pppox.h Checkpatch warns about the use of __attribute__((packed)). So use the recommended __packed syntax instead. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:30:55 -04:00
Eric Dumazet	d7e8883cfc	net: make GRO aware of skb->head_frag GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:49 -04:00
Eric Dumazet	d3836f21b0	net: allow skb->head to be a page fragment skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:11 -04:00
Linus Torvalds	e7a7c9ab41	SCSI fixes on 20120430 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPnjtXAAoJEDeqqVYsXL0MNWkIAMYc5n06Ac/gdLY8xIIYD6GA OzLd/NQ2Q4RqZJsX26OueX2ZjMRYDixBAmiw837K9PvrLUHGfMAaninNyzDgVZ3R pbIK45OylIQrvMAl/R7ZzqihfxJjg5utqEUt1M2qv/ikkBJ8+zWAJTQKxRYJ1OAW 9QDhIP986hljNyZfDuqXBvA4XkngYeCO0WLjBOeYCseSzeV8vOLpmsD/ANHDcqA6 Ct4uM4KJWF4jHz3sN3w5T3morNwvh42onNcy++911HcH5Nc9bkOqBWQAZ5RtINp9 8rkUz3OtCCQpZz6ffF3ZXJaopuU3ELfP80t03OScr2T75SxLqLf6mFYVAcR7RpQ= =nXrE -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of SAS and SATA fixes; there are one or two longstanding bug fixes, but most of this is regression fixes." * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] libfc: update mfs boundry checking [SCSI] Revert "[SCSI] libsas: fix sas port naming" [SCSI] libsas: fix false positive 'device attached' conditions [SCSI] libsas, libata: fix start of life for a sas ata_port [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready [SCSI] libsas: unify domain_device sas_rphy lifetimes [SCSI] libsas: fix sas_get_port_device regression [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work [SCSI] libata: Pass correct DMA device to scsi host [SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue	2012-04-30 15:33:50 -07:00
Matthew Garrett	41b3254c93	efi: Add new variable attributes More recent versions of the UEFI spec have added new attributes for variables. Add them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-30 15:30:18 -07:00
Linus Torvalds	9883035ae7	pipes: add a "packetized pipe" mode for writing The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-29 13:12:42 -07:00

1 2 3 4 5 ...

29,074 commits