|
|---|
| SSF.OS.TCP |
| Implementation and Validation Tests |
|
Contents: SSF implementation of TCP standards
|
| SSF implementation of TCP standards |
|
Contents:
General: SSF TCP is intended for modeling bulk data transfers, such as for modeling of the Web traffic. It currently does not provide special processing for very small segments, nor the PUSH and URGENT processing. The design of SSF TCP conforms to the "plug-and-play" architecture provided by the protocol design framework SSF.OS. SSF TCP is fairly modular, and per-host configurable from a DML network configuration database. It should be relatively easy to add new processing modules or change the existing ones for modeling additional TCP features or variants. References:
|
|
0. TCP header |
| RFC requirements implemented in SSF TCP |
|---|
SOURCE_port, DEST_port, SEQno, ACKno, AdvertisedWnd flags: SYN, ACK, FIN nominal header length counted as 20 bytes |
| RFC requirements NOT implemented in SSF TCP |
flags: URG, PSH, RST TCP checksum, Urgent pointer, options |
| 1. TCP parameters and their initial values. | |
| RFC requirements / common choices | SSF implementation |
|---|---|
| ISN (Initial Send Sequence Number) |
ISN is DML-configurable with default value 0,
ISS 0The default initial send sequence number increment ISS_INCR = 280000. The maximum value of sequence number is 2**63-1, with NO wrap-around. |
|
SMSS (Sender Maximum Segment Size): size of the largest segment the sender can transmit. RMSS (Receiver Maximum Segment Size): size of the largest segment the receiver is willing to accept. Specified in the MSS option during connection startup. SMSS and RMSS do not include TCP/IP headers and options. If MSS option not used, SMSS = 536 bytes in RFC 1122; 1024 bytes in common implementations. |
SSF TCP does not implement the MSS option, and SMSS = RMSS = MSS.
MSS is DML-configurable with default value of 1024 bytes.
MSS 1024 |
| Send buffer size (in bytes) |
Send Buffer Size is DML-configurable in units of MSS,
with default value of 16 (16*MSS bytes)
SendBufferSize 16The send buffer is implemented as a linked list of pseudo-data segments. |
| Receive buffer size (in bytes) |
Receive buffer size is DML-configurable in units of MSS,
with default value of 16 (16*MSS bytes)
RcvWndSize 16The Receive Buffer is implemented as a circular array of pseudo-data segments. |
Transmission control sender session state variables, in bytes (RFC 2581):
|
Certain variables are DML-configurable as follows:
|
| IW (sender's initial congestion window after 3-way handshake). RFC2581: less than or equal to 2 * SMSS, no more than two segments. | IW = MSS. |
|
LW (Loss Window = congestion window size after retransmission timeout) is equal to SMSS (RFC2581). RW (Restart Window, congestion window size after TCP connection restart from an idle period) is equal to IW (RFC2581). |
LW = RW = IW = MSS. |
| Maximum retransmission timeout shift (The maximum number of retransmission attempts before a TCP connection gives up) is configurable with default value of 12, per [WS95]) |
Implemented as the maximum number of retransmission attempts, DML-configurable
with default value 12:
MaxRexmitTimes 12 |
| TTL (Time to Live) for IP layer to send TCP packet: It's 60 in RFC793 and changed to be configurable in RFC1122. | Default TTL is set to 20 in SSF.OS.IP. Will be made DML-configurable in future releases. |
|
2. TCP clocks and timers The default values of all timers are from reference [WS95]. | |
| RFC requirements / common choices | SSF implementation |
|---|---|
|
Most TCP implementations use two clocks (tick counters) driven by the operating system; they are used to advance a number of TCP timers. Typically the slow clock advances in steps of 500 ms, and the fast clock advances in steps of 200 ms. There is one instance of each of the 7 timers listed below per TCP session (connection endpoint). |
Both slow and fast clocks are
DML-configurable with default step values in seconds
TCP_SLOW_INTERVAL 0.500 TCP_FAST_INTERVAL 0.200Each value can be set with accuracy of 0.001 s (1 ms). Class tcpSessionMaster maintains private clocks in each host instance; their initial phases are chosen randomly, independently of other hosts. All tcpSessions (connection endpoints) in a host share these clocks, but of course each session's timers are independent. |
| Connection establishment timer: starts when a SYN packet is sent. If a response is not received within 75 seconds, the connection is aborted. | In SSF TCP, SYN packet is treated the same as a data packet, and it uses the retransmission timer. That may be changed in a future release. |
| Retransmission timer: The value of this timer (retransmission timeout, RTO) is calculated dynamically, based on round-trip time (RTT) measurements. |
The SSF implementation follows [WS95]: When a data segment is sent,
start the retransmission timer with the current RTO value, unless the timer is already running.
When an ACK is received: if the ACK is for the last segment sent (no data in flight),
cancel the timer, else restart the timer with the current RTO value.
When a data segment is retransmitted,
start the retransmission timer with twice the latest RTO value, or with maxRTO,
whichever is smaller.
The initial value of RTO is set to 3 seconds. RTO is bounded between 1 and 64 seconds (WS95). |
| Delayed ACK timer is set when TCP receives data that must be acknowledged, but need not be acknowledged immediately. Instead, TCP receiver may wait up to 200 ms (TCP_FAST_INTERVAL) before sending an ACK (according to WS95; RFC 2581 allows 500 ms delay). If during this 200-ms period, TCP has data to send on this connection, the pending acknowledgment is sent along with the data (piggybacking). |
In SSF TCP, delayed-ack processing is optional, and can be selected from DML via
delayed_ack trueThe delay value can be selected from DML via TCP_FAST_INTERVAL. |
| Persist timer: is set when a TCP session advertises a receive window of size 0, preventing the other end from sending data. | In SSF TCP, it is assumed that the receiver consumes immediately all received data. The persist timer is not implemented. |
| Keepalive timer: fires if connection is idle for 2 hours. | In SSF TCP, the keepalive timer is not implemented. |
| FIN_WAIT_2 timer: is set to prevent a connection from staying in the FIN_WAIT_2 state forever. This timer is set to 10 minutes when the session enters the FIN_WAIT_2 state. When the FIN_WAIT_2 timer fires it is reset to 75 seconds. When it fires again, the connection is dropped. |
In SSF TCP, FIN_WAIT_2 timer is set to maximum idle time. The parameter
MaxIdleTime is DML-configurable with default value of 600 seconds.
MaxIdleTime 600 |
| TIME_WAIT timer: also called 2MSL timer, is set on entry to closing TIME_WAIT state. When TCP performs active close and sends final ACK, that connection must stay in the TIME_WAIT state for time up to 2*MSL. |
In SSF TCP, 2MSL is DML-configurable with default value of 120 seconds
(RFC 793):
MSL 60 |
| 3. RTT estimation and RTO calculation |
| RFC requirements |
|---|
|
RFC 793: Measure the elapsed time between sending a data octet with a particular sequence number and receiving an ACK that covers that sequence number (segments sent do not have to match segments ACKed). RFCs do not specify when to perform RTT measurements, nor on which segments (except for exclusion of retransmissions, cf. Karn's algorithm), but do specify how to compute RTO given an RTT measurement. Computation of RTO (RFC 1122): A host TCP MUST implement Karn's algorithm and Jacobson's algorithm. The following values SHOULD be used to initialize the estimation parameters for a new connection:
The lower RTO bound SHOULD be measured in fractions of a second, the upper bound should be 2*MSL, i.e., 240 seconds. |
| SSF TCP implementation |
|
The RTT measurement algorithm and RTO calculation algorithm are adapted from S94 (pp 301-303) and WS95 (pp 836-847). Since RTT measurement algorithm description is confusing, here's a concise summary:
|
| 4. Send window management & ACK processing | ||||||||||||||||||||
|
General rule: when an ACK is received, first update all affected state variables, then send the full usable window of segments. Send Sequence Space (in bytes):
1 2 3 4
------][---------][--------------][--------- increasing SN
| | |
| | |
snd_una snd_max snd_una+snd_wnd
Terminology:
seg_seq + seg_len - 1 = last sequence number of a segment | ||||||||||||||||||||
| RFC requirements/common choices implemented in SSF TCP | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Adopted from WS95.
Acceptable ACK numbers always satisfy: snd_una =< seg_ack =< snd_max(original definitions in RFC 793 are obsoleted). For a retransmitted segment: snd_nxt < snd_maxbecause when sending new data, snd_nxt = snd_max; but for a retransmission snd_nxt = snd_una. It's possible to have seg_ack > snd_nxt because of packet reordering. In SSF TCP, the usable window definition below holds both for new data transmission and retransmission. The usable window size (current amount of data that may be sent) is: D = snd_una + snd_wnd - snd_nxtThe rule for updating snd_wnd depends on the current sending state, such as Slow Start, Congestion Avoidance, etc., and on the TCP variant. | ||||||||||||||||||||
|
TCP session sending states In this section we consider only the connection ESTABLISHED session state. SSF TCP follows requirements from RFC 2581 and WS95; and in cases of differences between them, the implemented choice is noted. It is convenient to restate the four classical congestion control algorithms and the corresponding ACK processing rules in terms of TCP session sending states. In a TCP session, the sending states are:
A sending state is defined in terms of the current values of ssthresh and duplicate ACK counter. A transition from one sending state to another may take place on either of the following events:
Initial sending state: when a TCP session enters the ESTABLISHED state after completing the 3-way hanshake, its sending state is Slow Start with initial values of ssthresh and cwnd = IW. For each sending state, if the retransmission timer fires, reset ssthresh and cwnd:
ssthtresh = max (flight_size/2, 2*SMSS)
cwnd = LW = SMSS
execute RTT etc. updates, transition to Slow Start, and send the segment
with SN snd_una. Following WS95, SSF TCP uses
flight_size = snd_wnd = min(cnwd, rwnd) Note: In SSF TCP the value of duplicate ACK threshold is set to 3 (RFC 2581). "Get 3 dup ACKs" means receive 4 consecutive, identical ACKs without any other intervening packets in-between. Identification of a duplicate ACK: A received TCP packet is a dup ACK if all of the following apply:
If a received TCP packet does not satify all of the above tests, reset dup ACK counter to zero. Slow Start (all variants):
Congestion Avoidance (all variants):
Note: In SSF TCP Reno the extra additive term floor(MSS/8) when increasing the congestion window: cwnd += SMSS*SMSS/cwnd + MSS/8 may be used or be commented out. It's use in BSD Reno implementation is considered a bug, and is ruled out in RFC 2581. Fast Retransmit (generic Tahoe):
Note: To execute Fast Retransmit, snd_nxt = snd_una. When exiting Fast Retransmit and entering Slow Start, there are two possible choices for the value of snd_nxt (not specified in RFCs):
SSF TCP Tahoe implements choice 1 for agreement with the ns-2 implementation. Fast Retransmit/Fast Recovery (generic Reno):
Note 1: RFC 2581 and S94, p. 312 state that a new data ACK in the above table "should be the ACK of the retransmission [FastRexmit], ... Additionally, this ACK should acknowledge all the intermediate segments sent between the lost packet and the receipt of the first duplicate ACK." However, such a test is not made in the WS95 source code, and we don't implement it either. Note 2: To execute Fast Retransmit, snd_nxt = snd_una. When transiting from Fast Retransmit to Fast Recovery phase, there are two possible choices for the next value of snd_nxt (not specified in RFCs):
SSF TCP Reno implements choice 2 for agreement with the BSD implementation in WS95. Note 3: A new data packet can be sent only when the usable window size satisfies D > 0 (in SSF TCP only full-sized data segments are sent, thus D >= MSS), implying the condition: min(cwnd, rwnd) - (snd_nxt - snd_una) >= MSS After Fast Retransmission snd_nxt returns to the immediately preceding value, usually snd_nxt = snd_max. Therefore, due to the reduction of cwnd after Fast Retransmission, the usable window size D may become zero or negative, preventing packet transmission during Fast Recovery until enough dup ACKs are received to open the window. Note 4: A modification of the Fast Retransmit/Fast Recovery is defined in RFC 2582 and called NewReno. It deals much better with multiple losses per window. NewReno is currently not implemented in SSF TCP. | ||||||||||||||||||||
| RFC requirements NOT implemented in SSF TCP | ||||||||||||||||||||
| In SSF TCP only full sized packet can be sent. If usable window D < MSS, no packet is sent. (Will be sent when more data is available). |
| 5. Receive window management & ACK generation | ||||||||||
Receive Sequence Space (in bytes, WS95 p. 809):
1 2 3
------][-----------------------][--------- increasing SN
| |
| |
rcv_nxt rcv_adv = rcv_nxt+rcv_wnd
Terminology:
A segment is judged to occupy a portion of valid receive sequence space if either of the following two conditions is true: rcv_nxt =< seg_seq < rcv_nxt + rcv_wnd rcv_nxt =< seg_seq + seg_len - 1 < rcv_nxt + rcv_wnd | ||||||||||
| RFC requirements implemented in SSF TCP | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
General rule: when a segment is received, first update all affected state variables, then may generate an ACK. ACK generation requirements (RFC 1122, 2581):
SSF TCP implementation: When the delayed-ack option is not set, SSF TCP generates an ACK for every segment received; and immediately after segment reception. Delayed-ack option In SSF TCP the delayed-ack option is DML-configurable, and can be selected in tcpinit with: delayed_ack true
SSF TCP implementation: Delayed ACKs are timed by a fast clock with DML-configurable period TCP_FAST_INTERVAL (default 200 ms). The following is implemented:
| ||||||||||
| RFC requirements NOT implemented in SSF TCP | ||||||||||
|
Avoidance of Silly Window Syndrome (SWS): avoid advancing the right receive
window edge rcv_nxt + rcv_wnd in small increments.
In SSF TCP the SWS algorithm is not implemented. SSF TCP sends only full-sized packet (rcv_nxt increases by MSS), and all received data are immediately consumed by the receiver (rcv_wnd = receiver buffer size). |
| 6. Packet loss identification and retransmission management |
| RFC requirements implemented in SSF TCP |
|---|
|
Possible packet loss is identified at a sender either by a segment retransmission timer timeout, or by receipt of 3 consecutive duplicate acknowledgments. This section repeats some of the rules presented in other sections. |
|
Retransmission timer timeout: The following sequence of steps is taken when a retransmission timeout occurs. It includes the Karn's algorithm.
|
|
Duplicate ACKs In SSF TCP the value of duplicate ACK threshold is set to 3 (RFC 2581). Processing of duplicate ACKs depends on the sending state, see the section "Send window (sequence space) management". Identification of a duplicate ACK: A received TCP packet is a dup ACK if all of the following apply:
If a received TCP packet does not satify all of the above tests, reset dup ACK counter to zero. |
| 7. Opening a TCP connection (session). |
| RFC requirements implemented in SSF TCP |
|---|
Sequence number synchronization (3-way handshake):
|
Simultaneous open:
|
| 8. Closing a connection |
| RFC requirements implemented in SSF TCP |
|---|
| In SSF TCP the closing process is compliant with the TCP state transition diagram in reference [WS95], which is slightly different from RFC793. |
| 9. TCP Connection States | ||||||||||||||||||||||
| RFC requirements implemented in SSF TCP | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
SSF TCP implements all states and transitions. It has complete opening and closing phases including half-closing.
|
| SSF Implementation of TCP Variants |
|
Distinct TCP variants can be selected from DML by setting the appropriate option attributes. SSF.OS.TCP currently supports two TCP option attributes, delayed_ack and fast_recovery. To select TCP Tahoe from DML, in tcpinit use: fast_recovery false To select TCP Reno from DML, in tcpinit use: fast_recovery true Both in Tahoe and Reno the delayed ACK option can be selected in tcpinit with delayed_ack true A note about the meaning of "TCP variants" SSF TCP uses the names "Tahoe" and "Reno" in the sense of generic behavior, not implying that the behavior is identical to the original BSD TCP implementations known under these names. Tahoe includes Slow Start, Congestion Avoidance, Fast Retransmission, but not Fast Recovery. Reno extends Tahoe by the addition of Fast Recovery. The extra additive term floor(MSS/8) when increasing congestion window during congestion avoidance may be used or be commented out. It's use in TCP implementations is considered a bug, and is ruled out in RFC 2581. |
About this document
The SSF TCP pages and content created by Hongbo Liu and
Andy Ogielski with partial support from AT&T Labs-Research.
Last updated June 6, 2000.
Entire Contents Copyright © 1998, 1999, 2000 SSF Research Network. All rights
reserved.