linux-pinenote/drivers/net
Christoph Schulz a8a3e41c67 net: pppoe: use correct channel MTU when using Multilink PPP
The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

		/*
		 * hdrlen includes the 2-byte PPP protocol field, but the
		 * MTU counts only the payload excluding the protocol field.
		 * (RFC1661 Section 2)
		 */
		mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
 +    8 (ICMPv4 header)
 +   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
 +    4 (MP header)
 +    2 (PPP protocol field for the MP payload (0x3d))
 +    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa689537
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:35:46 -07:00
..
appletalk
arcnet
bonding bonding: Advertize vxlan offload features when supported 2014-06-18 16:49:51 -07:00
caif
can slcan: Port write_wakeup deadlock fix from slip 2014-06-16 21:29:13 -07:00
cris
dsa net: dsa: update DSA drivers to use ds_to_priv 2014-04-30 13:31:25 -04:00
ethernet net: bcmgenet: fix RGMII_MODE_EN bit 2014-07-13 22:55:37 -07:00
fddi defxx: Fix !DYNAMIC_BUFFERS compilation warnings 2014-07-02 18:26:29 -07:00
hamradio
hippi
hyperv hyperv: fix apparent cut-n-paste error in send path teardown 2014-06-16 21:36:13 -07:00
ieee802154 at86rf230: fix irq setup 2014-06-22 18:04:03 -07:00
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
phy dp83640: Always decode received status frames 2014-07-09 17:00:34 -07:00
plip
ppp net: pppoe: use correct channel MTU when using Multilink PPP 2014-07-14 14:35:46 -07:00
slip slip: Fix deadlock in write_wakeup 2014-06-16 21:29:12 -07:00
team Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-03 23:32:12 -07:00
usb r8152: fix r8152_csum_workaround function 2014-07-11 14:51:21 -07:00
vmxnet3 vmxnet3: adjust ring sizes when interface is down 2014-06-16 21:26:40 -07:00
wan farsync: fix invalid memory accesses in fst_add_one() and fst_init_card() 2014-07-11 13:34:48 -07:00
wimax net: wimax: i2400m: control.c: Cleaning up conjunction always evaluates to false 2014-06-11 00:13:16 -07:00
wireless rt2800usb: Don't perform DMA from stack 2014-07-07 15:04:34 -04:00
xen-netback xen-netback: bookkeep number of active queues in our own module 2014-06-25 15:59:47 -07:00
dummy.c
eql.c
ifb.c
Kconfig
LICENSE.SRC
loopback.c
macvlan.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-11 16:02:55 -07:00
macvtap.c mactap: Fix checksum errors for non-gso packets in bridge mode 2014-04-30 16:12:22 -04:00
Makefile
mdio.c
mii.c
netconsole.c
nlmon.c
ntb_netdev.c net: use ethtool_cmd_speed_set helper to set ethtool speed value 2014-06-06 16:24:07 -07:00
rionet.c net: get rid of SET_ETHTOOL_OPS 2014-05-13 17:43:20 -04:00
sb1000.c
Space.c
sungem_phy.c
tun.c net-tun: restructure tun_do_read for better sleep/wakeup efficiency 2014-05-21 15:50:28 -04:00
veth.c
virtio_net.c net: get rid of SET_ETHTOOL_OPS 2014-05-13 17:43:20 -04:00
vxlan.c vxlan: Checksum fixes 2014-06-15 01:00:50 -07:00
xen-netfront.c xen-netfront: call netif_carrier_off() only once when disconnecting 2014-07-08 11:21:03 -07:00