TCP
TCP
TCP
CPSC 441:TCP 1
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
CPSC 441:TCP 2
Sequence and Acknowledgement
Number
❒ TCP views data as unstructured, but
ordered stream of bytes.
❒ Sequence numbers are over bytes, not
segments
❒ Initial sequence number is chosen randomly
❒ TCP is full duplex – numbering of data is
independent in each direction
❒ Acknowledgement number – sequence
number of the next byte expected from
the sender
❒ ACKs are cumulative
CPSC 441:TCP 3
TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
❍ byte stream “number”
of first byte in 1000 byte Seq=4
data 2,
segment’s data ACK=
79, da
t a
ACKs: host ACKs
❍ seq # of next byte receipt of
d ata data
expected from other = 10 4 3, no
side 79 , AC K
Seq=
❍ cumulative ACK
CPSC 441:TCP 4
TCP reliable data transfer
❒ TCP creates rdt ❒ Retransmissions are
service on top of IP’s triggered by:
unreliable service ❍ timeout events
❒ Pipelined segments ❍ duplicate acks
❒ Cumulative acks ❒ Initially consider
❒ TCP uses single simplified TCP sender:
ignore duplicate acks
retransmission timer
❍
❍ ignore flow control,
congestion control
CPSC 441:TCP 5
TCP sender events:
data rcvd from app: timeout:
❒ Create segment with ❒ retransmit segment that
seq # caused timeout
❒ seq # is byte-stream ❒ restart timer
number of first data Ack rcvd:
byte in segment ❒ If acknowledges
❒ start timer if not previously unacked
already running (think segments
of timer as for oldest ❍ update what is known to be
acked
unacked segment) ❍ start timer if there are
❒ expiration interval: outstanding segments
TimeOutInterval
CPSC 441:TCP 6
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
❒ speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
❒ app process may be
slow at reading from
buffer
CPSC 441:TCP 8
TCP Flow control: how it works
❒ Rcvr advertises spare
room by including value
of RcvWindow in
segments
❒ Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order ❍ guarantees receive
segments) buffer doesn’t overflow
❒ spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
CPSC 441:TCP 9
Silly Window Syndrome
❒ Recall: TCP uses sliding window
❒ “Silly Window” occurs when small-sized
segments are transmitted, resulting in
inefficient use of the network pipe
❒ For e.g., suppose that TCP sender
generates data slowly, 1-byte at a time
❒ Solution: wait until sender has enough data
to transmit – “Nagle’s Algorithm”
CPSC 441:TCP 10
Nagle’s Algorithm
1. TCP sender sends the first piece of data
obtained from the application (even if data
is only a few bytes).
CPSC 441:TCP 11
Silly Window Continued …
❒ Suppose that the receiver consumes data
slowly
❍ Receive Window opens slowly, and thus sender
is forced to send small-sized segments
❒ Solutions
❍ Delayed ACK
❍ Advertise Receive Window = 0, until reasonable
amount of space available in receiver’s buffer
CPSC 441:TCP 12
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection” before
Step 1: client host sends TCP SYN
exchanging data segments
segment to server
❒ initialize TCP variables:
❍ specifies initial seq #
❍ seq. #s
❍ no data
❍ buffers, flow control info
Step 2: server host receives SYN,
(e.g. RcvWindow) replies with SYNACK segment
❒ client: connection initiator
❍ server allocates buffers
Socket clientSocket = new
❍ specifies server initial seq. #
Socket("hostname","port
number"); Step 3: client receives SYNACK,
replies with ACK segment, which
❒ server: contacted by client may contain data
Socket connectionSocket =
welcomeSocket.accept();
CPSC 441:TCP 13
TCP Connection Establishment
CPSC 441:TCP 14
TCP Connection Termination
client server
closing
FIN_WAIT1 FIN
CLOSE_WAIT
ACK
LAST_ACK
FIN_WAIT2 FIN
TIME_WAIT
timed wait
A CK
CLOSED
CLOSED
CPSC 441:TCP 15
Principles of Congestion Control
❒ Congestion: informally: “too many sources
sending too much data too fast for network to
handle”
❒ Different from flow control!
❒ Manifestations:
❍ Packet loss (buffer overflow at routers)
❍ Increased end-to-end delays (queuing in router buffers)
❒ Results in unfairness and poor utilization of
network resources
❍ Resources used by dropped packets (before they were
lost)
❍ Retransmissions
❍ Poor resource allocation at high load
CPSC 441:TCP 16
Historical Perspective
❒ October 1986, Internet had its first
congestion collapse
❒ Link LBL to UC Berkeley
❍ 400 yards, 3 hops, 32 Kbps
❍ throughput dropped to 40 bps
❍ factor of ~1000 drop!
❒ Van Jacobson proposes TCP Congestion
Control:
❍ Achieve high utilization
❍ Avoid congestion
❍ Share bandwidth
CPSC 441:TCP 17
Congestion Control: Approaches
❒ Goal: Throttle senders as needed to ensure
load on the network is “reasonable”
❒ End-end congestion control:
❍ no explicit feedback from network
❍ congestion inferred from end-system observed
loss, delay
❍ approach taken by TCP
CPSC 441:TCP 18
TCP Congestion Control: Overview
❒ end-end control (no network assistance)
❒ Limit the number of packets in the network to
window W
❒ Roughly,
W
rate = Bytes/sec
RTT
CPSC 441:TCP 19
TCP Congestion Controls
❒ Tahoe (Jacobson 1988)
❍ Slow Start
❍ Congestion Avoidance
❍ Fast Retransmit
❒ SACK
❒ Vegas (Brakmo & Peterson 1994)
❍ Delay and loss as indicators of congestion
CPSC 441:TCP 20
Slow Start
❒ “Slow Start” is used to receiver
sender
reach the equilibrium state cwnd
❒ Initially: W = 1 (slow start) 1 data
segment
❒ On each successful ACK:
ACK
W ←W + 1 2
❒ Exponential growth of W
each RTT: W ← 2 x W 3
4
❒ Enter CA when
W >= ssthresh
5
❒ ssthresh: window size after 6
7
which TCP cautiously probes 8
for bandwidth
CPSC 441:TCP 21
Congestion Avoidance
❒ Starts when sender receiver
W ≥ ssthresh 1 data
segment
❒ On each successful
ACK
ACK: 2
W ← W+ 1/W
❒ Linear growth of W
3
each RTT:
W ←W + 1
4
CPSC 441:TCP 22
CA: Additive Increase,
Multiplicative Decrease
❒ We have “additive increase” in the absence
of loss events
❒ After loss event, decrease congestion
window by half – “multiplicative decrease”
❍ ssthresh = W/2
❍ Enter Slow Start
CPSC 441:TCP 23
Detecting Packet Loss
❒ Assumption: loss 10
11
indicates congestion 12 X
13
❒ Option 1: time-out 14
15
❒ Option 2: duplicate
11
ACKs
11
CPSC 441:TCP 24
Fast Retransmit
❒ Wait for a timeout is quite long
❒ Immediately retransmits after 3
dupACKs without waiting for timeout
❒ Adjusts ssthresh
ssthresh ← W/2
❒ Enter Slow Start
W=1
CPSC 441:TCP 25
How to Set TCP Timeout Value?
❒ longer than RTT
❍ but RTT varies
❒ too short: premature timeout
❍ unnecessary retransmissions
❒ too long: slow reaction to segment
loss
CPSC 441:TCP 26
How to Estimate RTT?
CPSC 441:TCP 27
TCP Round-Trip Time and Timeout
EstimatedRTT = (1- α )*EstimatedRTT + α *SampleRTT
❒ EWMA 350
❒ influence of past
sample decreases
exponentially fast
❒ typical value: α =
0.125
300
CPSC 441:TCP 28
TCP Round Trip Time and Timeout
[Jacobson/Karels Algorithm]
(typically, β = 0.25)
Typically, µ =1 and Ø = 4.
CPSC 441:TCP 29
TCP Tahoe: Summary
❒ Basic ideas
❍ Gently probe network for spare capacity
❍ Drastically reduce rate on congestion
❍ Windowing: self-clocking
❍ Other functions: round trip time estimation,
error recovery
for every ACK {
if (W < ssthresh) then W++ (SS)
else W += 1/W (CA)
}
for every loss {
ssthresh = W/2
W =1
}
CPSC 441:TCP 30
TCP Tahoe
Window
W2
W1
ssthresh=W2/2
W2/2
ssthresh=W1/2
W1/2
Reached initial
ssthresh value; Time
switch to CA mode Slow Start
CPSC 441:TCP 31
Questions?
❒ Q. 1. To what value is ssthresh initialized to at
the start of the algorithm?
CPSC 441:TCP 32
TCP Reno
Note how there is “Fast Recovery” after cutting Window in half
Window
Reached initial
ssthresh value; Time
switch to CA mode Slow Start
CPSC 441:TCP 33
TCP Reno: Fast Recovery
❒ Objective: prevent `pipe’ from emptying after fast
retransmit
❍ each dup ACK represents a packet having left the pipe
(successfully received)
❍ Let’s enter the “FR/FR” mode on 3 dup ACKs
ssthresh ← W/2
retransmit lost packet
W ← ssthresh + ndup (window inflation)
Wait till W is large enough; transmit new packet(s)
On non-dup ACK (1 RTT later)
W ← ssthresh (window deflation)
enter CA mode
CPSC 441:TCP 34
TCP Reno: Summary
❒ Fast Recovery along with Fast Retransmit
used to avoid slow start
❒ On 3 duplicate ACKs
❍ Fast retransmit and fast recovery
❒ On timeout
❍ Fast retransmit and slow start
CPSC 441:TCP 35
TCP Throughput
❒ What’s the average throughout ot TCP as a
function of window size and RTT?
❍ Ignore slow start
❒ Let W be the window size when loss occurs.
❒ When window is W, throughput is W/RTT
❒ Just after loss, window drops to W/2,
throughput to W/2RTT.
❒ Average throughout: .75 W/RTT
CPSC 441:TCP 36
TCP Futures
❒ Example: 1500 byte segments, 100ms RTT, want 10
Gbps throughput
❒ Requires window size W = 83,333 in-flight
segments
❒ Throughput in terms of loss rate:
1.22 ⋅ MSS
RTT L
❒ ➜ L = 2·10-10 Wow
❒ New versions of TCP for high-speed needed!
CPSC 441:TCP 37
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
CPSC 441:TCP 38
Fairness (more)
❒ TCP fairness: dependency on RTT
❍ Connections with long RTT get less throughput
CPSC 441:TCP 39
Chapter 3: Summary
❒ principles behind transport layer
services:
❍ multiplexing, demultiplexing
❍ reliable data transfer
❍ flow control
❍ congestion control
CPSC 441:TCP 40