linux-uconsole/net
Chieh-Min Wang 3a1ce97938 netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
[ Upstream commit 13f5251fd1 ]

For bridge(br_flood) or broadcast/multicast packets, they could clone
skb with unconfirmed conntrack which break the rule that unconfirmed
skb->_nfct is never shared.  With nfqueue running on my system, the race
can be easily reproduced with following warning calltrace:

[13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P        W       4.4.60 #7744
[13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
[13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14)
[13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8)
[13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0)
[13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24)
[13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618)
[13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc)
[13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
[13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
[13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c)
[13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
[13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
[13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4)
[13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac)
[13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170)
[13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420)
[13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248)
[13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0)
[13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c)
[13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368)
[13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44)
[13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200)
[13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64)
[13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34)

The original code just triggered the warning but do nothing. It will
caused the shared conntrack moves to the dying list and the packet be
droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).

- Reproduce steps:

+----------------------------+
|          br0(bridge)       |
|                            |
+-+---------+---------+------+
  | eth0|   | eth1|   | eth2|
  |     |   |     |   |     |
  +--+--+   +--+--+   +---+-+
     |         |          |
     |         |          |
  +--+-+     +-+--+    +--+-+
  | PC1|     | PC2|    | PC3|
  +----+     +----+    +----+

iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass

ps: Our nfq userspace program will set mark on packets whose connection
has already been processed.

PC1 sends broadcast packets simulated by hping3:

hping3 --rand-source --udp 192.168.1.255 -i u100

- Broadcast racing flow chart is as follow:

br_handle_frame
  BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
  // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
  br_handle_frame_finish
    // check if this packet is broadcast
    br_flood_forward
      br_flood
        list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
          maybe_deliver
            deliver_clone
              skb = skb_clone(skb)
              __br_forward
                BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
                // queue in our nfq and received by our userspace program
                // goto __nf_conntrack_confirm with process context on CPU 1
    br_pass_frame_up
      BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
      // goto __nf_conntrack_confirm with softirq context on CPU 0

Because conntrack confirm can happen at both INPUT and POSTROUTING
stage.  So with NFQUEUE running, skb->_nfct with the same unconfirmed
conntrack could race on different core.

This patch fixes a repeating kernel splat, now it is only displayed
once.

Signed-off-by: Chieh-Min Wang <chiehminw@synology.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:33:08 +02:00
..
6lowpan 6lowpan: iphc: reset mac_header after decompress to fix panic 2018-07-06 12:32:12 +02:00
9p 9p/net: fix memory leak in p9_client_create 2019-03-23 20:09:38 +01:00
802
8021q net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
ax25 ax25: fix possible use-after-free 2019-02-23 09:07:27 +01:00
batman-adv batman-adv: release station info tidstats 2019-03-13 14:02:34 -07:00
bluetooth Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer 2019-04-03 06:26:14 +02:00
bpf bpf/test_run: support cgroup local storage 2018-08-03 00:47:32 +02:00
bpfilter net: bpfilter: use get_pid_task instead of pid_task 2018-10-17 22:03:40 -07:00
bridge netfilter: ebtables: remove BUGPRINT messages 2019-03-27 14:14:42 +09:00
caif Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
can can: bcm: check timer values before ktime conversion 2019-01-31 08:14:39 +01:00
ceph libceph: wait for latest osdmap in ceph_monc_blacklist_add() 2019-03-27 14:14:39 +09:00
core net-sysfs: call dev_hold if kobject_init_and_add success 2019-04-03 06:26:17 +02:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp dccp: do not use ipv6 header for ipv4 flow 2019-04-03 06:26:15 +02:00
decnet decnet: fix using plain integer as NULL warning 2018-08-09 14:11:24 -07:00
dns_resolver net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
dsa net: dsa: slave: Don't propagate flag changes on down slave interfaces 2019-02-12 19:47:22 +01:00
ethernet
hsr net/hsr: fix possible crash in add_timer() 2019-03-19 13:12:38 +01:00
ieee802154 ieee802154: lowpan_header_create check must check daddr 2019-01-09 17:38:31 +01:00
ife
ipv4 netfilter: ipt_CLUSTERIP: fix warning unused variable cn 2019-03-23 20:09:57 +01:00
ipv6 ila: Fix rhashtable walker list corruption 2019-04-03 06:26:18 +02:00
iucv Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
kcm Revert "kcm: remove any offset before parsing messages" 2018-09-17 18:43:42 -07:00
key af_key: unconditionally clone on broadcast 2019-03-23 20:09:48 +01:00
l2tp l2tp: fix infoleak in l2tp_ip6_recvmsg() 2019-03-19 13:12:38 +01:00
l3mdev
lapb
llc llc: do not use sk_eat_skb() 2018-12-01 09:37:27 +01:00
mac80211 mac80211: Fix Tx aggregation session tear down with ITXQs 2019-03-23 20:09:45 +01:00
mac802154 net: mac802154: tx: expand tailroom if necessary 2018-08-06 11:21:37 +02:00
mpls mpls: Return error for RTA_GATEWAY attribute 2019-03-10 07:17:19 +01:00
ncsi net/ncsi: Fixup .dumpit message flags and ID check in Netlink handler 2018-08-22 21:39:08 -07:00
netfilter netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm 2019-04-05 22:33:08 +02:00
netlabel netlabel: fix out-of-bounds memory accesses 2019-03-10 07:17:18 +01:00
netlink genetlink: Fix a memory leak on error path 2019-04-03 06:26:15 +02:00
netrom netrom: switch to sock timer API 2019-02-06 17:30:07 +01:00
nfc net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails 2019-03-10 07:17:18 +01:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch openvswitch: Avoid OOB read when parsing flow nlattrs 2019-01-31 08:14:32 +01:00
packet packets: Always register packet sk in the same order 2019-04-03 06:26:17 +02:00
phonet phonet: fix building with clang 2019-03-23 20:09:51 +01:00
psample
qrtr net: qrtr: Reset the node and port ID of broadcast messages 2018-07-05 20:20:03 +09:00
rds rds: fix refcount bug in rds_sock_addref 2019-02-12 19:47:22 +01:00
rfkill Here are quite a large number of fixes, notably: 2018-09-03 22:12:02 -07:00
rose net: rose: fix a possible stack overflow 2019-04-03 06:26:17 +02:00
rxrpc rxrpc: Fix client call queueing, waiting for channel 2019-03-19 13:12:39 +01:00
sched net: sched: fix cleanup NULL pointer exception in act_mirr 2019-04-03 06:26:19 +02:00
sctp sctp: use memdup_user instead of vmemdup_user 2019-04-03 06:26:17 +02:00
smc net/smc: fix smc_poll in SMC_INIT state 2019-03-19 13:12:41 +01:00
strparser strparser: remove redundant variable 'rd_desc' 2018-08-01 10:00:06 -07:00
sunrpc svcrpc: fix UDP on servers with lots of threads 2019-03-23 20:10:10 +01:00
switchdev
tipc tipc: fix cancellation of topology subscriptions 2019-04-03 06:26:18 +02:00
tls net/tls: Init routines in create_ctx 2019-01-13 09:51:00 +01:00
unix missing barriers in some of unix_sock ->addr and ->path accesses 2019-03-19 13:12:41 +01:00
vmw_vsock vsock/virtio: reset connected sockets on device removal 2019-03-13 14:02:36 -07:00
wimax wimax: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
wireless cfg80211: extend range deviation for DMG 2019-03-05 17:58:52 +01:00
x25 net/x25: fix a race in x25_bind() 2019-03-19 13:12:40 +01:00
xdp xsk: do not call synchronize_net() under RCU read lock 2018-10-11 10:19:01 +02:00
xfrm xfrm: Fix inbound traffic via XFRM interfaces across network namespaces 2019-03-23 20:09:49 +01:00
compat.c sock: Make sock->sk_stamp thread-safe 2019-01-09 17:38:33 +01:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile
socket.c net: socket: set sock->sk to NULL after calling proto_ops::release() 2019-03-10 07:17:18 +01:00
sysctl_net.c