Perf tools session at NFWS 2010 pointed out a false sharing on struct
fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit
only if needed (ie : not already set)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As skb->protocol is not valid in LOCAL_OUT add
parameter for address family in packet debugging functions.
Even if ports are not present in AH and ESP change them to
use ip_vs_tcpudp_debug_packet to show at least valid addresses
as before. This patch removes the last user of skb->protocol
in IPVS.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Connections in backup server should inherit the
forwarding method from real server. It is a way to fix a
problem where the forwarding method in backup connection
is damaged by logical OR operation with the real server's
connection flags. And the change is needed for setups
where the backup server uses different forwarding method
for the same real servers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch deals with local client processing.
Prefer LOCAL_OUT hook for scheduling connections from
local clients. LOCAL_IN is still supported if the packets are
not marked as processed in LOCAL_OUT. The idea to process
requests in LOCAL_OUT is to alter conntrack reply before
it is confirmed at POST_ROUTING. If the local requests are
processed in LOCAL_IN the conntrack can not be updated
and matching by state is impossible.
Add the following handlers:
- ip_vs_reply[46] at LOCAL_IN:99 to process replies from
remote real servers to local clients. Now when both
replies from remote real servers (ip_vs_reply*) and
local real servers (ip_vs_local_reply*) are handled
it is safe to remove the conn_out_get call from ip_vs_in
because it does not support related ICMP packets.
- ip_vs_local_request[46] at LOCAL_OUT:-98 to process
requests from local client
Handling in LOCAL_OUT causes some changes:
- as skb->dev, skb->protocol and skb->pkt_type are not defined
in LOCAL_OUT make sure we set skb->dev before calling icmpv6_send,
prefer skb_dst(skb) for struct net and remove the skb->protocol
checks from TUN transmitters.
[ horms@verge.net.au: removed trailing whitespace ]
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch deals with local real servers:
- Add support for DNAT to local address (different real server port).
It needs ip_vs_out hook in LOCAL_OUT for both families because
skb->protocol is not set for locally generated packets and can not
be used to set 'af'.
- Skip packets in ip_vs_in marked with skb->ipvs_property because
ip_vs_out processing can be executed in LOCAL_OUT but we still
have the conn_out_get check in ip_vs_in.
- Ignore packets with inet->nodefrag from local stack
- Require skb_dst(skb) != NULL because we use it to get struct net
- Add support for changing the route to local IPv4 stack after DNAT
depending on the source address type. Local client sets output
route and the remote client sets input route. It looks like
IPv6 does not need such rerouting because the replies use
addresses from initial incoming header, not from skb route.
- All transmitters now have strict checks for the destination
address type: redirect from non-local address to local real
server requires NAT method, local address can not be used as
source address when talking to remote real server.
- Now LOCALNODE is not set explicitly as forwarding
method in real server to allow the connections to provide
correct forwarding method to the backup server. Not sure if
this breaks tools that expect to see 'Local' real server type.
If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
Now it should be possible connections in backup that lost
their fwmark information during sync to be forwarded properly
to their daddr, even if it is local address in the backup server.
By this way backup could be used as real server for DR or TUN,
for NAT there are some restrictions because tuple collisions
in conntracks can create problems for the traffic.
- Call ip_vs_dst_reset when destination is updated in case
some real server IP type is changed between local and remote.
[ horms@verge.net.au: removed trailing whitespace ]
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Currently, ip_route_me_harder after ip_vs_out_icmp
is called even if packet is not related to IPVS connection.
Move it into handle_response_icmp. Also, force rerouting
if sending to local client because IPv4 stack uses addresses
from the route.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Create new function ip_vs_defrag_user to return correct
IP_DEFRAG_xxx user depending on the hooknum. It will be needed
when we add handlers in LOCAL_OUT.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The recent change in IP_VS_XMIT_TUNNEL to set
CHECKSUM_NONE is not correct. After adding IPIP header
skb->csum becomes invalid but the CHECKSUM_PARTIAL
case must be supported. So, use skb_forward_csum() which is
most suitable for us to allow local clients to send IPIP
to remote real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Delivering locally ICMP from FORWARD hook is not supported.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch is needed to avoid scheduling of
packets from local real server when we add ip_vs_in
in LOCAL_OUT hook to support local client.
Currently, when ip_vs_in can not find existing
connection it tries to create new one by calling ip_vs_schedule.
The default indication from ip_vs_schedule was if
connection was scheduled to real server. If real server is
not available we try to use the bypass forwarding method
or to send ICMP error. But in some cases we do not want to use
the bypass feature. So, add flag 'ignored' to indicate if
the scheduler ignores this packet.
Make sure we do not create new connections from replies.
We can hit this problem for persistent services and local real
server when ip_vs_in is added to LOCAL_OUT hook to handle
local clients.
Also, make sure ip_vs_schedule ignores SYN packets
for Active FTP DATA from local real server. The FTP DATA
connection should be created on SYN+ACK from client to assign
correct connection daddr.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Change skb->ipvs_property semantic. This is preparation
to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
will be used to avoid expensive lookups for traffic sent by
transmitters. Now when conntrack support is not used we call
ip_vs_notrack method to avoid problems in OUTPUT and
POST_ROUTING hooks instead of exiting POST_ROUTING as before.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Avoid full checksum calculation for apps that can provide
info whether csum was broken after payload mangling. For now only
ip_vs_ftp mangles payload and it updates the csum, so the full
recalculation is avoided for all packets.
Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
It is needed to support SNAT from local address for the case
when csum is fully recalculated.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
UDP not tested because it needs network card with HW CSUM support.
May be fixes problem where IPVS can not be used in virtual boxes.
Problem appears with DNAT to local address when the local stack
sends reply in CHECKSUM_PARTIAL mode.
Fix tcp_dnat_handler and udp_dnat_handler to provide
vaddr and daddr in right order (old and new IP) when calling
tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
If dma mapping fail we are dropping packages or fail to open device.
But exact reason of drop/fail stays unknow for a user, so print errors.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix switching device to low-speed mode after resume reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=502974
Reported-and-tested-by: Laurentiu Badea <bugzilla-redhat@wotevah.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we do not change rx buffer size any longer, we can
clean up rtl8169_change_mtu and in consequence rtl8169_down.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only increase tx_{packets,dropped} statistics when transmit or drop
full skb, not just fragment.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check possible dma mapping errors and do clean up if it happens.
Fix overwrap bug in rtl8169_tx_clear on the way.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert bvec_k{un,}map_irq() from macros to static inline functions if
!CONFIG_HIGHMEM, so we can easier detect mistakes like the one fixed in
93055c3104 ("ps3disk: passing wrong variable =
to
bvec_kunmap_irq()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Make the bnx2x driver use the new vlan accleration model.
Signed-off-by: Hao Zheng <hzheng@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the ixgbe driver use the new vlan accleration model.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
CC: Emil Tantilov <emil.s.tantilov@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the bnx2 driver use the new vlan accleration model.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If some of the underlying devices support it, enable vlan offload on
transmit for bridge devices. This allows senders to take advantage of the
hardware support, similar to other forms of acceleration.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that vlan acceleration is handled consistently regardless of usage,
it is possible to enable and disable it at will. This adds support for
Ethtool operations that change the offloading status for debugging
purposes, similar to other forms of hardware acceleration.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently each driver that is capable of vlan hardware acceleration
must be aware of the vlan groups that are configured and then pass
the stripped tag to a specialized receive function. This is
different from other types of hardware offload in that it places a
significant amount of knowledge in the driver itself rather keeping
it in the networking core.
This makes vlan offloading function more similarly to other forms
of offloading (such as checksum offloading or TSO) by doing the
following:
* On receive, stripped vlans are passed directly to the network
core, without attempting to check for vlan groups or reconstructing
the header if no group
* vlans are made less special by folding the logic into the main
receive routines
* On transmit, the device layer will add the vlan header in software
if the hardware doesn't support it, instead of spreading that logic
out in upper layers, such as bonding.
There are a number of advantages to this:
* Fixes all bugs with drivers incorrectly dropping vlan headers at once.
* Avoids having to disable VLAN acceleration when in promiscuous mode
(good for bridging since it always puts devices in promiscuous mode).
* Keeps VLAN tag separate until given to ultimate consumer, which
avoids needing to do header reconstruction as in tg3 unless absolutely
necessary.
* Consolidates common code in core networking.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A struct net_device always maps to zero or one vlan groups and we
always know the device when we are looking up a group. We currently
do a hash table lookup on the device to find the group but it is
much simpler to just store a pointer.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently users of hardware vlan accleration need to know whether
the device supports it before generating packets. However, vlan
acceleration will soon be available in a more flexible manner so
knowing ahead of time becomes much more difficult. This adds
a software fallback path for vlan packets on devices without the
necessary offloading support, similar to other types of hardware
accleration.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many (but not all) drivers check to see whether there is a vlan
group configured before using a tag stored in the skb. There's
not much point in this check since it just throws away data that
should only be present in the expected circumstances. However,
it will soon be legal and expected to get a vlan tag when no
vlan group is configured, so remove this check from all drivers
to avoid dropping the tags.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs.
Since vlan groups will soon be more of an implementation detail
for vlan devices, rename the constant to be descriptive of its
actual purpose.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An upcoming commit will allow packets with hardware vlan acceleration
information to be passed though more parts of the network stack, including
packets trunked through the bridge. This adds support for matching and
filtering those packets through ebtables.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a log message
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change min MTU to 68.
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace no wait CMD_ENABLE firmware devcmd with CMD_ENABLE_WAIT
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the firmware know about the mac address set by the user using ndo_set_mac_address
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for multiple hardware receive queues. The ingress traffic is hashed into one of the receive queues based on IP or TCP or both headers. The max no. of receive queues supported is 8.
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we get a bit-flip of ECC error while reading the data area, do not add it to
corrupted list, because it is possible that this is just unstable PEB with
corruptions caused by unclean reboots.
This patch also improves commentaries.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When the data does not contain all 0xFF bytes, 'check_data_ff()' should return
1, not -EINVAL; Also, the caller ('process_eb()') should not add the PEB to the
"corrupted" list if there was a read error.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
While scanning the flash we read all VID headers and store some important
information in 'struct ubi_scan_leb'. Store also the 'copy_flag' value there
as it is needed when comparing LEBs. We do not increase memory consumption
because this is just one bit and we have plenty of spare bits in
'struct ubi_scan_leb' (sizeof(struct ubi_scan_leb) is 48 both with and
without this patch).
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In 'ubifs_replay_journal()' we allocate 'sbuf' for scanning the log.
However, we already have 'c->sbuf' for these purposes, so do not
allocate yet another one. This reduces UBIFS memory consumption while
recovering.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a bug-fix: when we unmount, and we are currently in R/O
mode because of an error - we do not sync write-buffers, which
means we also do not cancel write-buffer timers we may possibly
have armed. This patch fixes the issue.
The issue can easily be reproduced by enabling UBIFS failure debug
mode (echo 4 > /sys/module/ubifs/parameters/debug_tsts) and
unmounting as soon as a failure happen. At some point the system
oopses because we have an armed hrtimer but UBIFS is unmounted
already.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a clean-up patch which:
1. Removes explicite 'hrtimer_cancel()' after 'ubifs_wbuf_sync()' in
'ubifs_remount_ro()', because the timers will be canceled by
'ubifs_wbuf_sync()', no need to cancel them for the second time.
2. Remove "if (c->jheads)" check from 'ubifs_put_super()', because
at journal heads must always be allocated there, since we checked
earlier that we were mounted R/W, and the olny situation when
journal heads are not allocated is when mounter or re-mounted R/O.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>