Commit graph

161832 commits

Author SHA1 Message Date
Christof Schmitt
799b76d09a [SCSI] zfcp: Decouple gid_pn requests from erp
Don't let the erp wait for gid_pn requests to complete. Instead, queue
the gid_pn work, exit erp and let the finished gid_pn work trigger a
new port reopen.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:28 -05:00
Swen Schillig
564e1c86c8 [SCSI] zfcp: Move qdio related data out of zfcp_adapter
The zfcp_adapter structure was growing over time to a size of almost
one memory page. To reduce the size of the data structure and to
seperate different layers, put all qdio related data in the new
zfcp_qdio data structure.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:27 -05:00
Swen Schillig
42428f747a [SCSI] zfcp: Separate qdio attributes from zfcp_fsf_req
Split all qdio related attributes out of zfcp_fsf_req and put it in
new structure.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:24 -05:00
Swen Schillig
4544683a4b [SCSI] zfcp: Move workqueue to adapter struct
Remove the global driver work queue and replace it with a workqueue
local to the adapter. The usage of this workqueue makes this the
correct place for the structure. In addition multiple adapters won't
block each other due to the serialization of the queued work.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:23 -05:00
Swen Schillig
09a46c6e34 [SCSI] zfcp: Remove the useless ZFCP_REQ_AUTO_CLEANUP flag
The flag ZFCP_REQ_AUTO_CLEANUP was useless as the
ZFCP_STATUS_FSFREQ_CLEANUP flag is there for exactly the same purpose.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:21 -05:00
Swen Schillig
a4623c467f [SCSI] zfcp: Improve request allocation through mempools
Remove the special case for NO_QTCB requests and optimize the
mempool and cache processing for fsfreqs. Especially use seperate
mempools for the zfcp_fsf_req and zfcp_qtcb structs.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:20 -05:00
Swen Schillig
058b864789 [SCSI] zfcp: Replace fsf_req wait_queue with completion
The combination wait_queue/wakeup in conjunction with the flag
ZFCP_STATUS_FSFREQ_COMPLETED to signal the completion of an fsfreq
was not race-safe and can be better solved by a completion.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:18 -05:00
Swen Schillig
bd63eaf4b8 [SCSI] zfcp: fix layering oddities between zfcp_fsf and zfcp_qdio
There is no need for the QDIO layer to have knowledge or do things
wich are done better by the FSF layer and vice versa.  Straighten a
few things to improve vividness.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:16 -05:00
Christof Schmitt
55c770fa11 [SCSI] zfcp: Implicitly close all wka ports
An adapter shutdown implicitly closes all open ports. Make sure to
mark all WKA ports as offline, not only the directory server. Also
make sure that no pending wka port work is running when the adapter is
being removed.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:15 -05:00
Christof Schmitt
14e242ea55 [SCSI] zfcp: Only issue one test link command per port
When the FCP channel returns a series of commands with the error
status "test link", zfcp will send a series of ELS ADISC commands.
This is technically no problem, but it is enough to only issue one
test command per remote port. So, track whether a ELS ADISC command is
already pending, and do not send a new one if there is already a
pending command.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:13 -05:00
Christof Schmitt
44f09f7376 [SCSI] zfcp: Remove useless assignment
Using a bitwise OR to not set anything at all is pointless so remove
the useless statement.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:12 -05:00
Christof Schmitt
2e261af84c [SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels
The default trace level is to only trace failed FSF commands. Thus it
is not necessary to collect trace data for most FSF commands, since
it will be thrown away later. Restructure the FSF/HBA trace
infrastructure to first check the trace level in a inline function and
only do the expensive data collection for matching trace levels.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:10 -05:00
Christof Schmitt
dcd20e2316 [SCSI] zfcp: Only collect SCSI debug data for matching trace levels
The default trace level is to only trace failed SCSI commands. Thus it
is not necessary to collect trace data for most SCSI commands since it
will be thrown away later. Restructure the SCSI trace infrastructure
to first check the trace level in a inline function and only do the
expensive data collection for matching trace levels.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:08 -05:00
Christof Schmitt
d46f384a89 [SCSI] zfcp: Move debug data from zfcp_data to own data structure
The struct zfcp_adapter includes everything related to the debug
traces. This introduces dependences between the definitions in
zfcp_def.h and zfcp_dbf.h. Move all debug related data structures to a
new data structure to break those dependencies and manage the debug
data in zfcp_dbf.[hc].

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:06 -05:00
Swen Schillig
a67417ab7e [SCSI] zfcp: invalid usage after free of port resources
In certain error scenarios ports, rports are getting attached,
validated and removed from the systems environment. Depending on the
layer this occurs asynchronously. This patch fixes the few races
which existed and ensures all references and cross references are
cleared at the time they're invalid. In addition fc transports
actions are only scheduled when required.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:49:04 -05:00
Yanqing_Liu@Dell.com
cdf69bb91b [SCSI] scsi_dh_rdac: add support for next generation of Dell PV array
This patch is to add DM support for next generation of Dell PowerVault
storage array.

Signed-off-by: Yanqing Liu <Yanqing_Liu@Dell.com>
Acked-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-05 08:46:09 -05:00
Jan Glauber
81bd5f6c96 crypto: sha-s390 - Fix warnings in import function
That patch should fix the warnings.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-09-05 16:27:35 +10:00
Patrick McHardy
c9f1d0389b net_sched: fix class grafting errno codes
If the parent qdisc doesn't support classes, use EOPNOTSUPP.
If the parent class doesn't exist, use ENOENT. Currently EINVAL
is returned in both cases.

Additionally check whether grafting is supported and remove a now
unnecessary graft function from sch_ingress.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04 23:10:15 -07:00
Brian Haley
b1f5719558 netlink: silence compiler warning
CC      net/netlink/genetlink.o
net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

From following the code 'err' is initialized, but set it to zero to
silence the warning.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04 20:36:52 -07:00
Steven Rostedt
85bac32c4a ring-buffer: only enable ring_buffer_swap_cpu when needed
Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.

Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:42:22 -04:00
Steven Rostedt
62f0b3eb5c ring-buffer: check for swapped buffers in start of committing
Because the irqsoff tracer can swap an internal CPU buffer, it is possible
that a swap happens between the start of the write and before the committing
bit is set (the committing bit will disable swapping).

This patch adds a check for this and will fail the write if it detects it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:38:42 -04:00
Steven Rostedt
e8165dbb03 tracing: report error in trace if we fail to swap latency buffer
The irqsoff tracer will fail to swap the cpu buffer with the max
buffer if it preempts a commit. Instead of ignoring this, this patch
makes the tracer report it if the last max latency failed due to preempting
a current commit.

The output of the latency tracer will look like this:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 2.6.31-rc5
 # --------------------------------------------------------------------
 # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: save_args
 #  => ended at:   __do_softirq
 #
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
    bash-4281    1d.s6  265us : update_max_tr_single: Failed to swap buffers due to commit in progress

Note the latency time and the functions that disabled the irqs or preemption
will still be listed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:22:41 -04:00
Steven Rostedt
659372d3e4 tracing: add trace_array_printk for internal tracers to use
This patch adds a trace_array_printk to allow a tracer to use the
trace_printk on its own trace array.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:13:53 -04:00
Steven Rostedt
e77405ad80 tracing: pass around ring buffer instead of tracer
The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:59:39 -04:00
Steven Rostedt
f633903af2 tracing: make tracing_reset safe for external use
Reseting the trace buffer without first disabling the buffer and
waiting for any writers to complete, can corrupt the ring buffer.

This patch makes the external version of tracing_reset safe from
corruption by disabling the ring buffer and calling synchronize_sched.

This version can no longer be called from interrupt context. But all those
callers have been removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:46:51 -04:00
Steven Rostedt
2f26ebd549 tracing: use timestamp to determine start of latency traces
Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:44:22 -04:00
Vlad Yasevich
f1751c57f7 sctp: Catch bogus stream sequence numbers
Since our TSN map is capable of holding at most a 4K chunk gap,
there is no way that during this gap, a stream sequence number
(unsigned short) can wrap such that the new number is smaller
then the next expected one.  If such a case is encountered,
this is a protocol violation.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:03 -04:00
Wei Yongjun
be2971438d sctp: remove dup code in net/sctp/output.c
Use sctp_packet_reset() instead of dup code.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:02 -04:00
Wei Yongjun
9237ccbc0b sctp: turn flags in 'struct sctp_association' into bit fields
This shrinks the size of struct sctp_association a little.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:02 -04:00
Bhaskar Dutta
723884339f sctp: Sysctl configuration for IPv4 Address Scoping
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack

Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:01 -04:00
Vlad Yasevich
8da645e101 sctp: Get rid of an extra routing lookup when adding a transport.
We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet.  There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.

We now do just the one call to do both route to destination and get
path mtu updates.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:01 -04:00
Vlad Yasevich
a803c94230 sctp: Turn flags in 'sctp_packet' into bit fields
This shrinks the size of sctp_packet a little.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:01 -04:00
Vlad Yasevich
4007cc88ce sctp: Correctly track if AUTH has been bundled.
We currently track if AUTH has been bundled using the 'auth'
pointer to the chunk.  However, AUTH is disallowed after DATA
is already in the packet, so we need to instead use the
'has_auth' field.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Wei Yongjun
d521c08f4c sctp: fix to reset packet information after packet transmit
The packet information does not reset after packet transmit, this
may cause some problems such as following DATA chunk be sent without
AUTH chunk, even if the authentication of DATA chunk has been
requested by the peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Vlad Yasevich
31b02e1549 sctp: Failover transmitted list on transport delete
Add-IP feature allows users to delete an active transport.  If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.

Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Vlad Yasevich
f68b2e05f3 sctp: Fix SCTP_MAXSEG socket option to comply to spec.
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Vlad Yasevich
cb95ea32a4 sctp: Don't do NAGLE delay on large writes that were fragmented small
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
b29e790728 sctp: Nagle delay should be based on path mtu
The decision to delay due to Nagle should be based on the path mtu
and future packet size.  We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation.  This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
d4d6fb5787 sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.
We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing.  The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing.  The fix is to use the peers old rwnd and add our flight
size to it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
4d3c46e683 sctp: drop a_rwnd to 0 when receive buffer overflows.
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
33ce828131 sctp: Clear fast_recovery on the transport when T3 timer expires.
If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void.  We can clear the fast recovery
flag.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Vlad Yasevich
b9f8478682 sctp: Fix error count increments that were results of HEARTBEATS
SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint.  As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon.  This goes for both
the overall error count and the path error count.

Now, there is a difference in how the detection is done
between the two.  The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold.  The overall error detection
is done _BEFORE_ the increment.  Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.

Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Alexey Dobriyan
d71a09ed55 sctp: use proc_create()
create_proc_entry() is deprecated (not formally, though).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Wei Yongjun
dadb50cc1a sctp: fix check the chunk length of received HEARTBEAT-ACK chunk
The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
  sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory.  We should really make sure that the chunk format
is what we expect, before attempting to touch the data.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Wei Yongjun
a2f36eec56 sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN
If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:57 -04:00
Vlad Yasevich
9c5c62be2f sctp: Send user messages to the lower layer as one
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:57 -04:00
Vlad Yasevich
5d7ff261ef sctp: Try to encourage SACK bundling with DATA.
If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich
e83963b769 sctp: Generate SACKs when actually sending outbound DATA
We are now trying to bundle SACKs when we have outbound
DATA to send.  However, there are situations where this
outbound DATA will not be sent (due to congestion or 
available window).  In such cases it's ok to wait for the
timer to expire.  This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.

Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich
3e62abf92f sctp: Fix data segmentation with small frag_size
Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation.   Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.

What we need to do is track the largest possbile DATA chunk that
can fit into the mtu.  Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down.  Otherwise,
we just use the smaller segment size without changing it further.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich
bec9640bb0 sctp: Disallow new connection on a closing socket
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down.  If this was a result of a close() call, there will be no socket
and will cause a memory leak.  We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00