Commit graph

104190 commits

Author SHA1 Message Date
Roland Dreier
8294f29767 RDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request()
Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL,
so just remove that parameter and always ring the doorbell.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
2008-07-14 23:48:49 -07:00
Jon Mason
52c8084b74 RDMA/cxgb3: Propagate HW page size capabilities
cxgb3 does not currently report the page size capabilities, and
incorrectly reports them internally.

This version changes the bit-shifting to a static value (per Steve's
request).

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Roland Dreier
1ff66e8c1f RDMA/nes: Encapsulate logic nes_put_cqp_request()
The iw_nes driver repeats the logic

	if (atomic_dec_and_test(&cqp_request->refcount)) {
		if (cqp_request->dynamic) {
			kfree(cqp_request);
		} else {
			spin_lock_irqsave(&nesdev->cqp.lock, flags);
			list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs);
			spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
		}
	}

over and over.  Wrap this up in functions nes_free_cqp_request() and
nes_put_cqp_request() to simplify such code.

In addition to making the source smaller and more readable, this shrinks
the compiled code quite a bit:

add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528)
function                                     old     new   delta
nes_free_cqp_request                           -     147    +147
nes_put_cqp_request                            -      17     +17
nes_modify_qp                               2316    2293     -23
nes_hw_modify_qp                             737     657     -80
nes_dereg_mr                                 945     860     -85
flush_wqes                                   501     416     -85
nes_manage_apbvt                             648     560     -88
nes_reg_mr                                  1117    1026     -91
nes_cqp_ce_handler                           927     769    -158
nes_alloc_mw                                1052     884    -168
nes_create_qp                               5314    5141    -173
nes_alloc_fmr                               2212    2035    -177
nes_destroy_cq                              1097     918    -179
nes_create_cq                               2787    2598    -189
nes_dealloc_mw                               762     566    -196

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
2008-07-14 23:48:49 -07:00
Moni Shoua
ee1e2c82c2 IPoIB: Refresh paths instead of flushing them on SM change events
The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level.  In most cases, SM
failover doesn't cause LID change so traffic won't stop.  In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state.  In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Joachim Fenkes
038919f296 IB/ehca: Make device table externally visible
This gives ehca an autogenerated modalias and therefore enables automatic loading.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Vladimir Sokolovsky
af40da894e IPoIB: add LRO support
Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne
1240673405 IPoIB: Use multicast loopback blocking if available
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne
521e575b9a IB/mlx4: Add support for blocking multicast loopback packets
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne
47ee1b9f2e IB/core: Add support for multicast loopback blocking
This patch also adds a creation flag for QPs,
IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
multicast sends from the QP to a group that the QP is attached to will
not be looped back to the QP's receive queue.  This can be used to
save receive resources when a consumer does not want a local copy of
multicast traffic; for example IPoIB must waste CPU time throwing away
such local copies of multicast traffic.

This patch also adds a device capability flag that shows whether a
device supports this feature or not.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Steve Wise
14cc180f7b RDMA/cxgb3: Add support for protocol statistics
- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low
  level driver to obtain the protocol mib from the rnic hardware.

- Add new iw_cxgb3 provider method to get the MIB from the low level
  driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Steve Wise
7f624d023b RDMA/core: Add iWARP protocol statistics attributes in sysfs
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier
a7d834c4bc IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Roland Dreier
468f2239bc RDMA/cma: Add missing newlines to printk()s
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-07-14 23:48:47 -07:00
Roland Dreier
eec8845d29 RDMA/cxgb3: Remove write-only iwch_rnic_attributes fields
The members struct iwch_rnic_attributes.vendor_id and .vendor_part_id
are write-only, so we might as well get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-07-14 23:48:47 -07:00
Steve Wise
97d1cc8055 RDMA/cxgb3: Fix up some ib_device_attr fields
- set fw_ver
- set hw_ver
- set max_qp_wr to something reasonable
- set max_cqe to something reasonable

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Stefan Roscher
6f7bc01a73 IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts
During corner case testing, we noticed that some versions of ehca do
not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI, if
EQEs are pending.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Joachim Fenkes
3e255eac56 IB/ehca: Reject receive work requests if QP is in RESET state
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Roland Dreier
7c27f35820 IB/mlx4: Remove extra code for RESET->ERR QP state transition
Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR
transitions") added some extra code to handle a QP state transition
from RESET to ERROR.  However, the latest 1.2.1 version of the IB spec
has clarified that this transition is actually not allowed, so we can
remove this extra code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Roland Dreier
d3809ad097 IB/mthca: Remove extra code for RESET->ERR QP state transition
Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some
extra code to handle a QP state transition from RESET to ERROR.
However, the latest 1.2.1 version of the IB spec has clarified that
this transition is actually not allowed, so we can remove this extra
code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Ralph Campbell
e5a5e7d59a IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Eli Cohen
6578cf3398 IB/mlx4: Pass congestion management class MADs to the HCA
ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so
process MADs of this class through the MAD_IFC firmware command.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Eli Cohen
d1f2cd895f IB/mlx4: Configure QPs' max message size based on real device capability
ConnectX returns the max message size it supports through the
QUERY_DEV_CAP firmware command.  When modifying a QP to RTR, the max
message size for the QP must be specified.  This value must not exceed
the value declared through QUERY_DEV_CAP.  The current code ignores
the max allowed size and unconditionally sets the value to 2^31.  This
patch sets all QPs to the max value allowed as returned from firmware.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Steve Wise
e7e5582999 RDMA/cxgb3: MEM_MGT_EXTENSIONS support
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
  fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Steve Wise
00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Eli Cohen
f89271da32 IPoIB: Copy small received SKBs in connected mode
The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-07-14 23:48:44 -07:00
Roland Dreier
f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Dotan Barak
4deccd6d95 RDMA: Improve include file coding style
Remove subversion $Id lines and improve readability by fixing other
coding style problems pointed out by checkpatch.pl.

Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Robert P. J. Day
fd91b1bf1b IB/ipath: Simplify code using ARRAY_SIZE() macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Eli Cohen
9670e55391 IB/mlx4: Optimize QP stamping
The idea is that for QPs with fixed size work requests (eg selective
signaling QPs), before stamping the WQE, we read the value of the DS
field, which gives the effective size of the descriptor as used in the
previous post.  Then we stamp only that area, since the rest of the
descriptor is already stamped.

When initializing the send queue buffer, make sure the DS field is
initialized to the max descriptor size so that the subsequent stamping
will be done on the entire descriptor area.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Moni Shoua
164ba0893c IB/sa: Fail requests made while creating new SM AH
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Sean Hefty
a947491709 RDMA: Fix license text
The license text for several files references a third software license
that was inadvertently copied in.  Update the license to what was
intended.  This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Christophe Jaillet
929555a2ba RDMA/nes: Remove unnecessary memset()
Remove an explicit memset(..., 0, ...) of a 'listener' structure
allocated with kzalloc().

Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Acked-by: Faisal Latif <faisal@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Roland Dreier
969a60f9db IB/srp: Remove use of cached P_Key/GID queries
The SRP initiator is currently using ib_find_cached_pkey() and
ib_get_cached_gid() in situations where the uncached ib_find_pkey()
and ib_query_gid() functions serve just as well: sleeping is allowed
and performance is not an issue.  Since we want to eliminate the
cached operations in the long term, convert SRP to use the uncached
variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Artem Bityutskiy
e56a99d5a4 UBIFS: add brief documentation
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-07-15 09:22:12 +03:00
Artem Bityutskiy
9527056630 MAINTAINERS: add UBIFS section
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-15 09:22:05 +03:00
Pavel Emelyanov
f66ac03d49 mib: add struct net to ICMPMSGIN_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:31 -07:00
Pavel Emelyanov
903fc1964e mib: add struct net to ICMPMSGOUT_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:30 -07:00
Pavel Emelyanov
dcfc23cac1 mib: add struct net to ICMP_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:29 -07:00
Pavel Emelyanov
75c939bb4d mib: add struct net to ICMP_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:28 -07:00
Pavel Emelyanov
43589aa93c icmp: drop unused MIB accounting wrappers
There are ICMP_XXX_STATS that are not used in the kernel, so I remove
them, not to "just patch" them later. But if there's some sense in
keeping them, kick me - I will remake this set keeping them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:26 -07:00
Pavel Emelyanov
fd54d716b1 inet: toss struct net initialization around
Some places, that deal with ICMP statistics already have where
to get a struct net from, but use it directly, without declaring
a separate variable on the stack.

Since I will need this net soon, I declare a struct net on the
stack and use it in the existing places in a separate patch not
to spoil the future ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:26 -07:00
Pavel Emelyanov
0388b00426 icmp: add struct net argument to icmp_out_count
This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:13 -07:00
Patrick McHardy
61362766d7 vlan: remove unnecessary include statements
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:55 -07:00
Patrick McHardy
d80aa31bbf vlan: clean up hard_start_xmit functions
Remove excessive comments and debugging, use NETDEV_TX codes,
remove some empty lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:39 -07:00
Patrick McHardy
1349fe9a6b vlan: clean up vlan_dev_hard_header()
Remove some debugging and excessive comments, merge the two
dev_hard_header calls into one.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:19 -07:00
Patrick McHardy
19b9a4e256 vlan: ethtool ->get_flags support
Allow to query LRO settings of underlying device when VLAN RX
acceleration is used.

Suggested by Ben Hutchings <bhutchings@solarflare.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:01 -07:00
Patrick McHardy
393e52e33c packet: deliver VLAN TCI to userspace
Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
properly deal with hardware VLAN tagging/stripping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:39 -07:00
Patrick McHardy
bbd6ef87c5 packet: support extensible, 64 bit clean mmaped ring structure
The tpacket_hdr is not 64 bit clean due to use of an unsigned long
and can't be extended because the following struct sockaddr_ll needs
to be at a fixed offset.

Add support for a version 2 tpacket protocol that removes these
limitations.

Userspace can query the header size through a new getsockopt option
and change the protocol version through a setsockopt option. The
changes needed to switch to the new protocol version are:

1. replace struct tpacket_hdr by struct tpacket2_hdr
2. query header len and save
3. set protocol version to 2
 - set up ring as usual
4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
   instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))

Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:15 -07:00
Patrick McHardy
bc1d0411b8 vlan: deliver packets received with VLAN acceleration to network taps
When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.

Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:30 -07:00
Patrick McHardy
6aa895b047 vlan: Don't store VLAN tag in cb
Use a real skb member to store the skb to avoid clashes with qdiscs,
which are allowed to use the cb area themselves. As currently only real
devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
invalidation is neccessary.

The new member fills a hole on 64 bit, the skb layout changes from:

        __u32                      mark;                 /*   172     4 */
        sk_buff_data_t             transport_header;     /*   176     4 */
        sk_buff_data_t             network_header;       /*   180     4 */
        sk_buff_data_t             mac_header;           /*   184     4 */
        sk_buff_data_t             tail;                 /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        sk_buff_data_t             end;                  /*   192     4 */

        /* XXX 4 bytes hole, try to pack */

to

        __u32                      mark;                 /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */

        /* XXX 2 bytes hole, try to pack */

        sk_buff_data_t             transport_header;     /*   180     4 */
        sk_buff_data_t             network_header;       /*   184     4 */

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:06 -07:00