Commit graph

39,145 commits

Author SHA1 Message Date
David S. Miller
a27cc68b9b We have a single bugfix for an invalid memory read.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVzc58AAoJEDBSmw7B7bqrhkQP/1bgnJHOtAthUygVCUDN3p6J
 rbmyfmszcWqPgYWMh4sfBRJQPWBpq4Q4ZPlVNCU8VkFoJRXemqWsOjnpMro0Vi8Y
 omTMPf1ph5WAUhCTjR6d+J7M5uHVzp1ougeaDEHY6QPTQ0rNTQ4W9gR+TFRSSTG2
 gVIuledUXQgkBW6nm3mT3bxSwMkK6PnPy6Dpkzm4VFt3JtO0ZpSV+RzMLifmNf1r
 25lYiUUyimWbJGvZCMV9O2qKWwb+9AIYrItW74yUV2dRqqBZmP+IqJQ2p2mTWcjE
 YBToZA5Yrd6iSfuKC/wd2kt3NkmsGdNGwFSy4tI6WjBkKKLygOKBvcpBzcNDgv06
 ZB4mcyfac+KPiDA1QFvwV+OrU1u63wWDyfjkGCVkoq6WJkUxUZ4JVOq/7LolrxK8
 d0mJojOMjTlr2Q+7KMINl9HDw0MhkYqY/fJv+AHOlcIGc2bVrJF0n9fWsc9gKdPu
 c+2lIJLxzuojapXJXj+XpF6iPjh92F83W6iPbXZfu6AzvYQwGwve87CckurwElYY
 bCDV0nXzmn5DYwEOuKtiflbmLW0qlXbjhX27B4yHmXsS3/Q7M/pMUy3mj4pchmxn
 Iwr8pVoCgn8Z1hDJvyYPb4l0ERjY/MZghBJg+Yp4Q7CEWGOmyIPdi0visA9FN/xk
 ekOYrjW9Jz3AlgMDPVCC
 =uehl
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have a single bugfix for an invalid memory read.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:03:10 -07:00
Calvin Owens
5d37852bf7 Revert "net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN"
Commit 8133534c76 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.

That change causes 4096 to no longer be accepted as a valid value for
'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
of those sysctls for a long time, and unfortunately seems to be an
extremely popular setting. This change breaks a large number of sysctl
configurations at Facebook.

That commit referred to b1cb59cf2e ("net: sysctl_net_core: check
SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
commit message seems to have happened because unix_stream_sendmsg()
expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
not because it had less than SOCK_MIN_SNDBUF allocated.

This particular issue doesn't seem to affect TCP however: using a
setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
some archs, so there would still be breakage.

Since a value of one doesn't seem to cause any problems, we can drop the
minimum 8133534c added to fix this.

This reverts commit 8133534c76.

Fixes: 8133534c76 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 12:10:30 -07:00
Phil Sutter
4b46995568 net: sch_generic: react upon IFF_NO_QUEUE flag
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 11:50:18 -07:00
Trond Myklebust
37bfcc14b2 Merge branch 'bugfixes'
* bugfixes:
  SUNRPC: Fix a thinko in xs_connect()
  NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
  NFS: nfs_set_pgio_error sometimes misses errors
2015-08-17 13:36:22 -05:00
Trond Myklebust
eed889b161 NFS: NFS over RDMA Client Side Changes
These patches improve both client performance and scalability, most notably
 by increasing the maixmum allowed rsize and wsize and by increasing the number
 of RDMA "credits".  There are also several bugfixes, such as correcting how
 WRITE compounds are encoded and fixing large NFS symlink operations.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVy5O3AAoJENfLVL+wpUDr4FsP/RyDj8wZiHmLy5LGlcuIK7lJ
 mOyblxvA1DdADNzHUk9ORG3T6+I2Be6K6gXgD4va+9tmcBAvUEI25OhnKnmau4U+
 mK9ZYUQWISKdd9FHkX33PmgnspxBP0DdS/rO9GeFRh9EhjGmpHQzOb6BbUDe04wA
 PEll9LKZM+NTzOzJL8X5kDA9rXHGLo1T+aN5OMA0EQqxRDurXgLvZXZ5YzLPzgzW
 52KdBIRr0ijFtXeIB87B/ATqEetcDgMRkxB2vMuTbF1Jsc1Fu2xGj2KyncCda+X7
 6rBMB139zB3rBGG2szWfXU733jZmID09esvSHiC5oM6KYVPMxlF9ktvs+dV8g07k
 t+svp4Y1LhHEojfuOKA9arEG7URdlBOUvqwF/UfwFOfcdYR2d2hyfzO/VtsdxhkB
 O+kNvrnu2WsWgiyzEVPw2K0d0I5pAQSSxR2lZKzOYFjpwwjqjuvX+xYeC35BvJHv
 raGvIvdnPBqd9r9DO/v7kFv4xSNLpyi8o0CLtRowWN29LRVJqhz8gWOttBWOJ/1r
 IJYRDMB+R3kWFKe99/ev3IUslV8hekqbH4iYUGuPwFQNurSb69AdrdktsMWHP/8P
 1bMCidNCcNUjui2kTX7HwTRWjjjXwK3+RTSToe6t9iKxMYtLSif4m3K7n/0dAzdK
 ip8Qpx18MuJHphNiXb79
 =7PAr
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFS over RDMA Client Side Changes

These patches improve both client performance and scalability, most notably
by increasing the maixmum allowed rsize and wsize and by increasing the number
of RDMA "credits".  There are also several bugfixes, such as correcting how
WRITE compounds are encoded and fixing large NFS symlink operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-17 13:33:58 -05:00
Trond Myklebust
99b1a4c32a SUNRPC: Fix a thinko in xs_connect()
It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:05:49 -05:00
Richard Alpe
8f8ff9135b tipc: don't sanity check non-existing TLV (NL compat)
A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 10:39:54 -07:00
Herbert Xu
de0ded77b9 ipsec: Replace seqniv with seqiv
Now that seqniv is identical with seqiv we no longer need it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-17 16:53:42 +08:00
Christophe Ricard
adca3c38d8 nfc: netlink: Warning fix
When NFC_ATTR_VENDOR_DATA is not set, data_len is 0 and data is NULL.

Fixes the following warning:

net/nfc/netlink.c:1536:3: warning: 'data' may be used uninitialized
+in this function [-Wmaybe-uninitialized]
      return cmd->doit(dev, data, data_len);

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-08-17 10:45:19 +02:00
Eric Dumazet
1e31367899 ipv4: fix refcount leak in fib_check_nh()
fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.

This patch solves the typical message at reboot time or device
dismantle :

unregister_netdevice: waiting for eth0 to become free. Usage count = 4

Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-16 22:14:32 -07:00
Christophe Ricard
fe202fe955 nfc: netlink: Add check on NFC_ATTR_VENDOR_DATA
NFC_ATTR_VENDOR_DATA is an optional vendor_cmd argument.
The current code was potentially using a non existing argument
leading to potential catastrophic results.

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-08-17 01:36:17 +02:00
Alexander Aring
c0015bf3a3 ieee802154: 6lowpan: fix non-lowpan wpan interfaces
We receive all 802.15.4 frames on the packet handler "lowpan_rcv" this
patch checks if the wpan device belongs to a lowpan interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-15 23:28:09 +02:00
Alexander Aring
0751272880 ieee802154: 6lowpan: fix packet layer registration
This patch fixes 802.15.4 packet layer registration when mutliple
lowpan interfaces will be added. We need to register the packet layer at
the first lowpan interface and deregister it at the last interface. This
done by open_count variable which is protected by rtnl.

Additional do a quiet fix by adding dev_put(real_dev) when netdev
registration fails, which fix the refcount for the wpan dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-15 23:28:09 +02:00
Linus Lüssing
53cf037bf8 batman-adv: Fix potentially broken skb network header access
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.

Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.

Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3d ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:10 +02:00
Simon Wunderlich
3f1e08d0ae batman-adv: remove broadcast packets scheduled for purged outgoing if
When an interface is purged, the broadcast packets scheduled for this
interface should get purged as well.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:09 +02:00
Marek Lindner
1f15510164 batman-adv: protect tt request from double deletion
The list_del() calls were changed to list_del_init() to prevent
an accidental double deletion in batadv_tt_req_node_new().

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:09 +02:00
Linus Lüssing
8a4023c5b5 batman-adv: Fix potential synchronization issues in mcast tvlv handler
So far the mcast tvlv handler did not anticipate the processing of
multiple incoming OGMs from the same originator at the same time. This
can lead to various issues:

* Broken refcounting: For instance two mcast handlers might both assume
  that an originator just got multicast capabilities and will together
  wrongly decrease mcast.num_disabled by two, potentially leading to
  an integer underflow.

* Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
  one after another try to do an
  hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
  cause memory corruption / crashes.
  (Reported by: Sven Eckelmann <sven@narfation.org>)

Right in the beginning the code path makes assumptions about the current
multicast related state of an originator and bases all updates on that. The
easiest and least error prune way to fix the issues in this case is to
serialize multiple mcast handler invocations with a spinlock.

Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:08 +02:00
Linus Lüssing
9c936e3f4c batman-adv: Make MCAST capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:51:40 +02:00
Linus Lüssing
ac4eebd484 batman-adv: Make TT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: e17931d1a6 ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:43 +02:00
Linus Lüssing
4635469f5c batman-adv: Make NC capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 3f4841ffb3 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:43 +02:00
Linus Lüssing
65d7d46050 batman-adv: Make DAT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 17cf0ea455 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:42 +02:00
Johannes Berg
40d9a38ad3 mac80211: use DECLARE_EWMA
Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.

This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:53 +02:00
Lorenzo Bianconi
b119ad6e72 mac80211: add rate mask logic for vht rates
Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask()
method in order to apply mcs mask for vht rates

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:51 +02:00
Lorenzo Bianconi
e910867bd2 mac80211: define rate_control_apply_mask_ratetbl()
Define rate_control_apply_mask_ratetbl() in order to apply ratemask in
rate_control_set_rates() for station rate table

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:51 +02:00
Lorenzo Bianconi
90c66bd223 mac80211: remove ieee80211_tx_rate dependency in rate mask code
Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(),
rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the
previous logic to define a ratemask in rate_control_set_rates() for
station rate table. Moreover move rate mask definition logic in
rate_control_cap_mask()

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:50 +02:00
Lorenzo Bianconi
35225eb7a5 mac80211: remove ieee80211_tx_info from rate_control_apply_mask signature
Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask
signature. rate_control_apply_mask() will be used to define a ratemask in
rate_control_set_rates() for station rate table

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:50 +02:00
Bertold Van den Bergh
4b819f6cc4 mac80211: Make OCB mode set BSSID
Perform the BSS_CHANGED_BSSID action when joining an OCB network.
This is required to set the broadcast BSSID in some network drivers.

Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:49 +02:00
Bertold Van den Bergh
cc11729893 mac80211: Only accept data frames in OCB mode
Currently OCB mode accepts frames with bssid==broadcast and type!=beacon.
Some non-data frames are sent matching this, for example probe responses.
This results in unnecessary creation of STA entries.

Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:49 +02:00
Bertold Van den Bergh
5765f9f66e mac80211: Set txrc.bss to true for OCB interfaces
To make mac80211 accept the multicast rate requested by the user the
rate control should be told that it is operating in BSS mode.
Without this, the default rate is selected in rate_control_send_low
(!pubsta and !txrc->bss)

Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:48 +02:00
Bertold Van den Bergh
876dc9308e nl80211: Allow setting multicast rate on OCB interfaces
Allow setting multicast rate on OCB interfaces.
Current behaviour results in EOPNOTSUPP when attempting this.

Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:48 +02:00
Michal Kazior
9189ee31df cfg80211: propagate set_wiphy failure to userspace
If driver failed to setup wiphy params (e.g. rts
threshold, fragmentation treshold) userspace
wasn't properly notified about this. This could
lead to user confusion who would think the command
succeeded even if that wasn't the case.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:47 +02:00
Matthias May
4edd56981c cfg80211: regulatory: handle 5 and 10 MHz channels properly
The original assumption of 20MHz wide channels hasn't been true since
the addition of support for 5 and 10 MHz channels.
Change the code to no longer disable all channels that don't fit into
the 20MHz grid, but instead set the appropriate flags to disable
operation on specific bandwidths.

Signed-off-by: Matthias May <matthias.may@neratec.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:48:46 +02:00
Eric Dumazet
83fccfc394 inet: fix potential deadlock in reqsk_queue_unlink()
When replacing del_timer() with del_timer_sync(), I introduced
a deadlock condition :

reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()

inet_csk_reqsk_queue_drop() can be called from many contexts,
one being the timer handler itself (reqsk_timer_handler()).

In this case, del_timer_sync() loops forever.

Simple fix is to test if timer is pending.

Fixes: 2235f2ac75 ("inet: fix races with reqsk timers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:46:22 -07:00
David Ahern
9972f134a2 net: frags: Add VRF device index to cache and lookup
Fragmentation cache uses information from the IP header to reassemble
packets. That information can be duplicated across VRFs -- same source
and destination addresses, protocol and id. Handle fragmentation with
VRFs by adding the VRF device index to entries in the cache and the
lookup arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
f7ba868b71 net: Use VRF index for oif in ip_send_unicast_reply
If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
3bfd847203 net: Use passed in table for nexthop lookups
If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,

$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
169.254.0.0/16 dev eth0  scope link  metric 1003
192.168.56.0/24 dev eth1  proto kernel  scope link  src 192.168.56.51

$ ip route ls table 10
1.1.1.0/24 dev eth2  scope link

Without this patch adding a nexthop route fails:

$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable

With this patch the route is added successfully.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
021dd3b8a1 net: Add routes to the table associated with the device
When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.

A part of this is directing prefsrc validations to the correct
table as well.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
30bbaa1950 net: Fix up inet_addr_type checks
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
15be405eb2 net: Add inet_addr lookup by table
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:21 -07:00
David Ahern
9a24abfa42 udp: Handle VRF device in sendmsg
For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:20 -07:00
David Ahern
613d09b30f net: Use VRF device index for lookups on TX
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:20 -07:00
David Ahern
cd2fbe1b6b net: Use VRF device index for lookups on RX
On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:43:20 -07:00
Andy Gospodarek
0344338bd8 net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo
This is useful information to include in ipv6 netlink messages that
report interface information.  IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6.  This closes that gap.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:35:30 -07:00
Sasha Levin
da65ad1fe3 net: allow sleeping when modifying store_rps_map
Commit 10e4ea751 ("net: Fix race condition in store_rps_map") has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.

Make rps_map_lock a mutex to allow sleeping under it.

Fixes: 10e4ea751 ("net: Fix race condition in store_rps_map")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:40:57 -07:00
Vivien Didelot
1114953615 net: dsa: add support for switchdev VLAN objects
Add new functions in DSA drivers to access hardware VLAN entries through
SWITCHDEV_OBJ_PORT_VLAN objects:

 - port_pvid_get() and vlan_getnext() to dump a VLAN
 - port_vlan_del() to exclude a port from a VLAN
 - port_pvid_set() and port_vlan_add() to join a port to a VLAN

The DSA infrastructure will ensure that each VLAN of the given range
does not already belong to another bridge. If it does, it will fallback
to software VLAN and won't program the hardware.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:31:12 -07:00
Andy Gospodarek
35103d1117 net: ipv6 sysctl option to ignore routes when nexthop link is down
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link.  The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:

net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.

1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
8000::/8 dev p8p1  proto kernel  metric 256  pref medium
9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
fe80::/64 dev p8p1  proto kernel  metric 256  pref medium

This also adds devconf support and notification when sysctl values
change.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:27:19 -07:00
Andy Gospodarek
cea45e208d net: track link status of ipv6 nexthops
Add support to track current link status of ipv6 nexthops to match
recent changes that added support for ipv4 nexthops.  This takes a
simple approach to track linkdown status for next-hops and simply
checks the dev for the dst entry and sets proper flags that to be used
in the netlink message.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:27:18 -07:00
Andy Whitcroft
25b97c016b ipv4: off-by-one in continuation handling in /proc/net/route
When generating /proc/net/route we emit a header followed by a line for
each route.  When a short read is performed we will restart this process
based on the open file descriptor.  When calculating the start point we
fail to take into account that the 0th entry is the header.  This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

  while read l; do echo "$l"; done </proc/net/route >A
  cat /proc/net/route >B
  diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output.  There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

  +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
  -tun1  00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

Fixes: 8be33e955c ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:16:29 -07:00
Linus Lüssing
a516993f0a net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e4 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 17:08:39 -07:00
Yuchung Cheng
b340b26454 tcp: TLP retransmits last if failed to send new packet
When TLP fails to send new packet because of receive window
limit, it should fall back to retransmit the last packet instead.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 16:52:20 -07:00