Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
to be able to build advanced filters.
This can help spreading packets on several sockets with a fast
selection, after RPS dispatch to N cpus for example, or to catch a
percentage of flows in one queue.
tcpdump -s 500 "cpu = 1" :
[0] ld CPU
[1] jeq #1 jt 2 jf 3
[2] ret #500
[3] ret #0
# take 12.5 % of flows (average)
tcpdump -s 1000 "rxhash & 7 = 2" :
[0] ld RXHASH
[1] and #7
[2] jeq #2 jt 3 jf 4
[3] ret #1000
[4] ret #0
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rui <wirelesser@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM+NETIF_F_IPV6_CSUM, but
some drivers miss the difference. Fix this and also fix UFO dependency
on checksumming offload as it makes the same mistake in assumptions.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac2
Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.
We can not use filter->rcu without being sure no other thread could do
the same thing.
Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.
Do this check only if skb head is not shared.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will also improve handling of ipv6 tcp socket request
backlog when syncookies are not enabled. When backlog
becomes very deep, last quarter of backlog is limited to
validated destinations. Previously only ipv4 implemented
this logic, but now ipv6 does too.
Now we are only one step away from enabling timewait
recycling for ipv6, and that step is simply filling in
the implementation of tcp_v6_get_peer() and
tcp_v6_tw_get_peer().
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate qdisc memory according to NUMA properties of cpus included in
xps map.
To be effective, qdisc should be (re)setup after changes
of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
I added a numa_node field in struct netdev_queue, containing NUMA node
if all cpus included in xps_cpus share same node, else -1.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid sparse warnings : add __rcu annotations and use
rcu_dereference_protected() where necessary.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
store_xps_map() allocates maps that are used by single cpu, it makes
sense to use NUMA allocations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds XPS_CONFIG option to enable and disable XPS. This is
done in the same manner as RPS_CONFIG. This is also fixes build
failure in XPS code when SMP is not enabled.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing struct netdev_queue state against FROZEN bit, we also test
XOFF bit. We can test both bits at once and save some cycles.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.
This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.
This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.
Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements transmit packet steering (XPS) for multiqueue
devices. XPS selects a transmit queue during packet transmission based
on configuration. This is done by mapping the CPU transmitting the
packet to a queue. This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).
Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets. This is configured as a CPU mask on a
per queue basis in:
/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
The mappings are stored per device in an inverted data structure that
maps CPUs to queues. In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.
The benefits of XPS are improved locality in the per queue data
structures. Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP). The benefits of XPS are dependent on
cache hierarchy, application load, and other factors. XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.
Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.
bnx2x on 16 core AMD
XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
No XPS (16 queues) 996K at 100% CPU
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue. This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.
We also allow the mapping of a socket to queue to be changed. To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure. If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket). This patch includes the modification in TCP output
for setting this flag.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)
scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unloading pktgen module needs ~6 seconds on a 64 cpus machine, to stop
64 kthreads.
Add a pktgen_exiting variable to let kernel threads die faster, so that
kthread_stop() doesnt have to wait too long for them. This variable is
not tested in fast path.
Note : Before exiting from pktgen_thread_worker(), we must make sure
kthread_stop() is waiting for this thread to be stopped, like its done
in kernel/softirq.c
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.
In ceph, add the missing flag.
In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At compile time, we can replace the DIV_K instruction (divide by a
constant value) by a reciprocal divide.
At exec time, the expensive divide is replaced by a multiply, a less
expensive operation on most processors.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting the translated instruction to 1 instead of 0 allows us to
remove one descrement at check time and makes codes[] array init
cleaner.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove pc variable to avoid arithmetic to compute fentry at each filter
instruction. Jumps directly manipulate fentry pointer.
As the last instruction of filter[] is guaranteed to be a RETURN, and
all jumps are before the last instruction, we dont need to check filter
bounds (number of instructions in filter array) at each iteration, so we
remove it from sk_run_filter() params.
On x86_32 remove f_k var introduced in commit 57fe93b374
(filter: make sure filters dont read uninitialized memory)
Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
avoid too many ifdefs in this code.
This helps compiler to use cpu registers to hold fentry and A
accumulator.
On x86_32, this saves 401 bytes, and more important, sk_run_filter()
runs much faster because less register pressure (One less conditional
branch per BPF instruction)
# size net/core/filter.o net/core/filter_pre.o
text data bss dec hex filename
2948 0 0 2948 b84 net/core/filter.o
3349 0 0 3349 d15 net/core/filter_pre.o
on x86_64 :
# size net/core/filter.o net/core/filter_pre.o
text data bss dec hex filename
5173 0 0 5173 1435 net/core/filter.o
5224 0 0 5224 1468 net/core/filter_pre.o
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warning for sk_filter_rcu_release():
Warning(net/core/filter.c:586): missing initial short description on line:
* sk_filter_rcu_release: Release a socket filter by rcu_head
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF_S_* are used internally, should not be exposed to the others.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since repeating u16 value to u8 value conversion using switch() clause's
case statement is wasteful, this patch introduces u16 to u8 mapping table
and removes most of case statements. As a result, the size of net/core/filter.o
is reduced by about 29% on x86.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add option to set skb priority to pktgen. Useful for testing
QOS features. Also by running pktgen on the vlan device the
qdisc on the real device can be tested.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).
The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().
This resolves the following error report.
ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
[<ffffffff8121c940>] kobject_init+0x3a/0x83
[<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
[<ffffffff8107b800>] ? mark_lock+0x21/0x267
[<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
[<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
[<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
[<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
[<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).
The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().
This resolves the following error report.
ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
[<ffffffff8121c940>] kobject_init+0x3a/0x83
[<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
[<ffffffff8107b800>] ? mark_lock+0x21/0x267
[<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
[<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
[<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
[<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
[<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.
The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.
This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:
[IFLA_AF_SPEC] = {
[AF_INET] = {
[IFLA_INET_CONF] = ...,
},
[AF_INET6] = {
[IFLA_INET6_FLAGS] = ...,
[IFLA_INET6_CONF] = ...,
}
}
The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined!
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation). Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use vlan_features to check for TSO support if there is
a vlan tag. However, it's quite likely that the NIC is not able to
do TSO when there is an arbitrary number of tags. Therefore if there
is more than one tag (in-band or out-of-band), fall back to software
emulation.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We assume that hardware TSO can't support multiple levels of vlan tags
but we allow it to be done. Therefore, enable GSO to parse these tags
so we can fallback to software.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking if it is necessary to linearize a packet, we currently
use vlan_features if the packet contains either an in-band or out-
of-band vlan tag. However, in-band tags aren't special in any way
for scatter/gather since they are part of the packet buffer and are
simply more data to DMA. Therefore, only use vlan_features for out-
of-band tags, which could potentially have some interaction with
scatter/gather.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nlmsg_total_size() calculates the length of a netlink message
including header and alignment. nla_total_size() calculates the
space an individual attribute consumes which was meant to be used
in this context.
Also, ensure to account for the attribute header for the
IFLA_INFO_XSTATS attribute as implementations of get_xstats_size()
seem to assume that we do so.
The addition of two message headers minus the missing attribute
header resulted in a calculated message size that was larger than
required. Therefore we never risked running out of skb tailroom.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]
We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Robin Holt <holt@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.
Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.
Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.
In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.
For other filters, additional cost is a single instruction.
[ Since we access fentry->k a lot now, cache it in a local variable
and mark filter entry pointer as const. -DaveM ]
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commit ef885afbf8 (net: use rcu_barrier() in
rollback_registered_many)
dst_dev_event() scans a garbage dst list that might be feeded by various
network notifiers at device dismantle time.
Its important to call dst_dev_event() after other notifiers, or we might
enter the infamous msleep(250) in netdev_wait_allrefs(), and wait one
second before calling again call_netdevice_notifiers(NETDEV_UNREGISTER,
dev) to properly remove last device references.
Use priority -10 to let dst_dev_notifier be called after other network
notifiers (they have the default 0 priority)
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fix a bug reported by backyes.
Right the first time pktgen's using queue_map that's not been initialized
by set_cur_queue_map(pkt_dev);
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Backyes <backyes@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should fix the following warning:
net/core/pktgen.c: In function ‘pktgen_if_write’:
net/core/pktgen.c:890: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Reviewed-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dev_pick_tx recompute the queue index if the value stored in the
socket is greater than or equal to the number of real queues for the
device. The saved index in the sock structure is not guaranteed to
be appropriate for the egress device (this could happen on a route
change or in presence of tunnelling). The result of the queue index
being bad would be to return a bogus queue (crash could prersumably
follow).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A program that accidentally writes too much data to the pktgen file can overflow
the kernel stack and oops the machine. This is only triggerable by root, so
there's no security issue, but it's still an unfortunate bug.
printk() won't print more than 1024 bytes in a single call, anyways, so let's
just never copy more than that much data. We're on a fairly shallow stack, so
that should be safe even with CONFIG_4KSTACKS.
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers. Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.
This works because:
1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
writes are allowed and the application will just continue
with another write to send the rest of the data.
2) For datagram oriented sockets, where there must be a
one-to-one correspondance between write() calls and
packets on the wire, INT_MAX is going to be far larger
than the packet size limit the protocol is going to
check for and signal with -EMSGSIZE.
Based upon a patch by Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds __rcu annotation to (struct fib_rule)->ctarget
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit
checksum with the checksum of an arbitrary part of the packet data,
whereas the FCoE CRC is something entirely different.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [2.6.32+]
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_can_checksum() incorrectly returns true in these cases:
1. The skb has both out-of-band and in-band VLAN tags and the device
supports checksum offload for the encapsulated protocol but only with
one layer of encapsulation.
2. The skb has a VLAN tag and the device supports generic checksumming
but not in conjunction with VLAN encapsulation.
Rearrange the VLAN tag checks to avoid these.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some panic reports in fib_rules_lookup() show a rule could have a NULL
pointer as a next pointer in the rules_list.
This can actually happen because of a bug in fib_nl_newrule() : It
checks if current rule is the destination of unresolved gotos. (Other
rules have gotos to this about to be inserted rule)
Problem is it does the resolution of the gotos before the rule is
inserted in the rules_list (and has a valid next pointer)
Fix this by moving the rules_list insertion before the changes on gotos.
A lockless reader can not any more follow a ctarget pointer, unless
destination is ready (has a valid next pointer)
Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru>
Reported-by: Joe Buehler <aspam@cox.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotation to :
(struct sock)->sk_filter
And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>