Sliding Window Protocol and TCP Congestion Control
Sliding Window Protocol and TCP Congestion Control
Sliding Window Protocol and TCP Congestion Control
1
Reliable data transfer
f
important in app., transport, link layers
2
Channel Abstractions
Abstract ons
Lossy FIFO channel
delivers a subsequence in FIFO order
example:
p delivery
y service provided
p by
ya
physical link
Lossy,
L reordering,
d i d
duplicative
li i (LRD)
channel
example:
x mpl :delivery
d liv service
s vic p provided
vid d b
by IP or by
b
UDP protocol
3
Sliding Window Protocol
Consider an infinite array, Source, at the
sender, and an infinite array, Sink, at the
receiver.
next expected r + RW – 1
Sink: received
0 1 2 r
P2
Receiver delivered receive window
SW send
d window
d size ((s - a ≤ SW))
RW receive window size
TCP Congestion Control (Simon S. Lam) 4
4
Sliding
Sl d ng W
Windows
ndows in
n Act
Action
on
Data unit r has just been received by P2
Receive window slides forward
5
Sliding
Sl d ng W
Windows
ndows in
n Act
Action
on
P1 has just received cumulative ack with
r+3
3 as nextt expected
t d sequence number
b
Send window slides forward
Source: send window
P1
Sender 0 1 2 a–1 a s–1 s
acknowledged
next expected r + RW – 1
Sink:
P2
0 1 2 r
Receiver
6
Sliding Window protocol
Functions provided
error control (reliable delivery)
in-order delivery
flow and congestion control (by varying send
window
i d size)
i )
TCP uses cumulative acks (needed for correctness)
Other
Oth ki kinds
d of f acks
k (to improve performance)
selective nack
selective ack (TCP SACK)
bit-vector representing entire state of receive
window ((in addition to first sequence
q number of
window)
TCP Congestion Control (Simon S. Lam) 7
7
Sliding Windows for Lossy FIFO Channels
A small
ll number
b off bi
bits iin packet
k h header
d f for
sequence number
Necessary and sufficient condition for correct
operation: SW + RW ≤ MaxSeqNum
Necessity: RW receive window size
SW send window size
Source: send window
P1
Sender
0 1 2 a–1 a
acknowledged unacknowledged
Sink:
next expected
0 1 2
P2
Receiver delivered receive window
8
Sliding Windows for Lossy FIFO
Ch
Channels
l
Sufficiency
y can only
y be Interesting
g special
p cases
demonstrated by using a SW = RW = 1
formal method to prove alternating-bit
that the protocol protocoll
provides reliable in-
SW = 7, RW = 1
order delivery. See
Sh k and
Shankar dLLam, ACM out of order arrivals
out-of-order
TOPLAS, Vol. 14, No. 3, not accepted, e.g.,
July
y 1992. HDLC
SW = RW
9
Sliding
Sl d ng W
Windows
ndows for LRD Channels
Assumption: Packets have bounded lifetime L
Be careful how fast sequence numbers are
consumed
m (i.e.,
( , byy arrival of
f data to be sent
into network)
(send rate)× L < MaxSeqNum
TCP
32-bit sequence numbers
counts bytes
assumes that datagrams will be discarded by IP
if too old
10
Window Size Controls Sending
W g Rate
RTT
Source 1 2 W 1 2 W
time
data ACKs
Destination 1 2 W 1 2 W
time
11
Throughput
Limit the number of unacked transmitted
packets
k ts iin th
the network
t k tto window
i d si
size W
W
Max.
M throughput
th h t packets/sec
k t /
RTT
W × MSS
= RTT bytes/sec
(assuming no loss, MSS denotes maximum segment size)
12
Throughput or send rate?
Previous formula actually provides an upper bound
Average
g number
m in the send buffer
ff is less than W unless
packet arrival rate to send buffer is infinite
If a packet is lost in the network with probability p, then
g time in send buffer
the average ff is (1 − p ) × RTT + p × TO
Since TO > RTT, actual throughput is smaller.
The
Th throughput
th hp t of
f a host
h st’ss TCP send
s nd buffer
b ff is th
the
host’s send rate into the network (including
original transmissions and retransmissions)
13
Fast Retransmit
Time-out period often If sender receives 3
relatively long:
long duplicate ACKs for
long delay before
the same data, it
resending lost packet
Detect lost segments supposes that
via duplicate ACKs segment after
Sender often sends ACKed data was
many segments back-to- lost:
back
If segment is lost, fast retransmit:
there
h will
ll likely
l k l be many resend se
segment
ment
duplicate ACKs. before timer expires
14
Host A Host B
egment
eout forr 2nd se
time
time
Resending
R di a segmentt afterft ttriple
i l dduplicate
li t ACK
without waiting for timeout
TCP Congestion Control (Simon S. Lam) 15
15
TCP F
Flow Control
flow control receiver: explicitly informs
sender won’t overrun y
sender of (dynamically y
receiver’s buffers by changing) amount of
transmitting too much, free buffer space
too fast RcvWindow field in
TCP segment header
16
Causes/costs of congestion: scenario
four senders
Q: what happens as λ and λ'in
multi-hop paths in
increase at many
Timeout & retransmit
senders? positive feedback
Host A instability
λin : original
g data
λ'in : original data plus
retransmitted data
finite shared output
li k b
link buffers
ff
Host B λout
17
Effect of Congest
Congestion
on
W too big for many flows -> congestion
Packet loss -> transmissions on links a packet has
traversed prior to loss are wasted
Congestion collapse due to too many retransmissions
and too much wasted transmission capacity
October 1986, Internet had its first congestion
collapse
goodput
upper bound
desired
collapse
load (aggregate send rate)
TCP Congestion Control (Simon S. Lam) 18
18
TCP Window
W ndow Control
19
TCP Congestion Control
end-to-end control (no network How does sender
assistance) determine CongWin?
sender limits transmission: loss event = timeout or
LastByteSent-LastByteAcked 3 duplicate acks
≤ CongWin TCP sender reduces
Roughly,
Roughly the send buffer’s
buffer s CongWin after a loss
event
three mechanisms:
CongWin
throughput
h h ≤
RTT
/
bytes/sec slow
l start
reduce to 1 segment
where CongWin is in bytes after timeout event
AIMD (additive
( ddi i iincrease
multiplicative decrease)
Note: For now consider RcvWindow to be very large such that the send window size is
equall to
t CongWin.
C Wi
20
TCP Slow Start
Probing for usable bandwidth
21
TCP Slow Start (more)
When connection
Host A Host B
begins, increase rate
exponentially until
RTT
first loss event or
“threshold”
double
d bl CongWini every
RTT
done by incrementing
CongWin by 1 MSS for
every ACK received
Summary: initial rate
is slow but ramps up
exponentially fast time
22
Congestion avoidance state &
responses to loss events 3 dup ACKs
Q: If no loss, when should
w size
14
the exponential increase 12
TCP
switch to linear? Reno
n window
10
A: When CongWin gets to
egments)
current value of 8
threshold 6
congestion
(se
threshold
Implementation:
4
2 TCP
For initial slow start, Tahoe
0
threshold
h h ld is set to a large
l 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value (e.g., 64 Kbytes)
Subsequently, threshold is Transmission round
variable
bl Tahoe Reno
At a loss event, threshold is
set to 1/2 of CongWin just Note: For simplicity,
p y CongWin
g is in number
b f
before lloss event of segments in the above graph.
TCP Congestion Control (Simon S. Lam) 23
23
Rationale f
for Reno’s F
Fast Recovery
y
After 3 dup ACKs:
24
TCP Reno ((example
mp scenario))
CongWin Timeout
3 dupACKs
halved
threshold
th sh ld reached
h d
Initial slow start
during tslow start
In this example, 3 dupACKs during slow
f
start before reachingg initial threshold
25
Example: FR/FR entry and ex
Example exitt
S 1 2 3 4 5 6 7 8 1 9 10 11
time
1 1 1 1 1 1 1 9 Exit FR/FR
loss
R time
26
FR/FR ((in
n more deta
detail)
l)
Enter FR/FR after 3 dupACKsp
Set ssthresh ← max(flightsize/2, 2)
Retransmit lost packet
Set cwnd ← ssthresh + #dupACKs (window inflation)
Wait till W=min(rwnd, cwnd) is large enough; transmit
new packet(s)
On
O non-dup
d ACK (1 RTT later),
l t ) set
s t cwnd
d ← ssthresh
ssth sh
(window deflation)
Enter Congestion Avoidance
27
Summary: TCP Congestion Control (Reno)
Summary
When CongWin is below Threshold, sender in
slow-start phase,
phase window grows exponentially (until
loss event or exceeding threshold).
28
Successive
Success ve Timeouts
T meouts
When there is another timeout, double the timeout
value
Keep doing so for each additional loss-retransmission
Exponential backoff up to
max timeout value equal
to 64 times initial timeout
value
29
AIMD in steady state (when no timeout)
additive increase: multiplicative decrease:
increase CongWin by cut CongWin in half
1 MSS every RTT in after loss event (3 dup
the absence of any acks)
loss event: probing
congestion
window What limits the average
24 Kbytes window size (or throughput)?
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
TCP Congestion Control (Simon S. Lam) 30
30
First approximation
M. Mathis, et al., “The Macroscopic Behavior of the TCP Congestion
Avoidance Algorithm,”ACM Computer Communicatons Review, 27(3), 1997.
No slow-start, no timeout, long-lived TCP
connection
c nn cti n
Independent identically distributed “periods”
Three dupACKs are received in a round with
probability p
Ave.
# of RTTs
TCP Congestion Control (Simon S. Lam) 31
31
Geometric Distribution
Independent
d d trials
l - a triall fails
f l with
h probability
b bl p
Ave. no. of transmissions to get first “failure”
∞ ∞
n = ibb = i (1 − p)
i =1
i
i =1
i −1
p
∞
= p i (1 − p )i −1
i =1
d ∞ d ∞
= −p
d i =1
dp
(1
( − p ) i
= − p
dp
d
((1 − p)
i =0
i
d 1 1
= −p =p 2
dp 1 − 1 + p p
= 1/ p
Ave no
Ave. no. of trials to get first “success”
success is
1/(1-p)
TCP Congestion Control (Simon S. Lam) 32
32
First
F rst approx
approximation
mat on (cont.)
Average number of send rate (in packets/sec)
packets delivered in
3 2
one period (area under W
no. of packets/period
one saw-tooth) = = 8
time per period W
2 2 RTT
W 1 W 3 2 2
+ = W
2 2 2 8 =
1/ p
=
1 3
2 RTT 2p
Average number of RTT
packets sent per period is 3p
1/p
Equate the two and solve
f W,
for W we gett 8
W =
3p TCP Congestion Control (Simon S. Lam) 33
33
TCP ACK generation
generat on [RFC 1122,, RFC 2581]
58 ]
34
Receiver implements
mp m Delayed
D y ACKsK
Receiver sends one ACK for every two packets
received -> each saw-tooth is WxRTT wide
-> area under
d a saw-tooth h is 3W 2 1
=
4 p
Send rate is
1/ p 1/ p 1 3
= =
RTT ⋅ W RTT ⋅ 4 / ((3 p ) RTT 4p
One ACK for every b packets received -> send rate
is
1 3
RTT 2bp
p
35
Challenges
g in the future
f
TCP average throughput (approximate) in terms of
loss rate, p
1.22 ⋅ MSS for b = 1
RTT p
Example: 1500-byte segments, 100ms RTT, to get
10 Gbps throughput,
throughput loss rate needs to be very low
p = 2x10-10
36
A more detailed model
Reference:
J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP
Th
Throughput:
h t A Simple
Si l Model
M d l and
d its
it E
Empirical
i i lVValidation,”
lid ti ”
Proceedings ACM SIGCOMM, 1998.
37
Mot vat on
Motivation
Previous formulas not so accurate when
loss rates are high
38
AIMD with Timeouts
triple
i l ddup acks
k
No slow start
39
Problem 3 in HW #2
Simplified:
no triple duplicate Acks
packet loss (timeout) with probability p
timeout interval fixed at T0 after each
loss
First success in
next cycle
40
The End
41