Sliding Window Protocol and TCP Congestion Control

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Sliding Window Protocol and

TCP Congestion Control


Simon S. Lam
Department of Computer Science
Th University
The U i it offT
Texas att A
Austin
ti

TCP Congestion Control (Simon S. Lam) 1

1
Reliable data transfer
f
 important in app., transport, link layers

 characteristics of unreliable channel will determine


p y of reliable data transfer protocol
complexity p (rdt)
( )

TCP Congestion Control (Simon S. Lam) 2

2
Channel Abstractions
Abstract ons
 Lossy FIFO channel
 delivers a subsequence in FIFO order
 example:
p delivery
y service provided
p by
ya
physical link

 Lossy,
L reordering,
d i d
duplicative
li i (LRD)
channel
 example:
x mpl :delivery
d liv service
s vic p provided
vid d b
by IP or by
b
UDP protocol

TCP Congestion Control (Simon S. Lam) 3

3
Sliding Window Protocol
 Consider an infinite array, Source, at the
sender, and an infinite array, Sink, at the
receiver.

Source: send window


P1 0 1 2 a–1 a s–1 s
Sender
acknowledged
g unacknowledged
g

next expected r + RW – 1
Sink: received
0 1 2 r
P2
Receiver delivered receive window
SW send
d window
d size ((s - a ≤ SW))
RW receive window size
TCP Congestion Control (Simon S. Lam) 4

4
Sliding
Sl d ng W
Windows
ndows in
n Act
Action
on
 Data unit r has just been received by P2
 Receive window slides forward

 P2 sends cumulative ack with sequence


number it expects to receive next (r+3)
Source: send window
P1
0 1 2 a–1 a s–1 s
Sender
acknowledged unacknowledged
r+3

Sink: next expected r + RW – 1


P2
0 1 2 r
Receiver

delivered receive window


TCP Congestion Control (Simon S. Lam) 5

5
Sliding
Sl d ng W
Windows
ndows in
n Act
Action
on
 P1 has just received cumulative ack with
r+3
3 as nextt expected
t d sequence number
b
 Send window slides forward
Source: send window
P1
Sender 0 1 2 a–1 a s–1 s

acknowledged

next expected r + RW – 1
Sink:
P2
0 1 2 r
Receiver

delivered receive window

TCP Congestion Control (Simon S. Lam) 6

6
Sliding Window protocol
 Functions provided
 error control (reliable delivery)
 in-order delivery
 flow and congestion control (by varying send
window
i d size)
i )
 TCP uses cumulative acks (needed for correctness)
 Other
Oth ki kinds
d of f acks
k (to improve performance)
 selective nack
 selective ack (TCP SACK)
 bit-vector representing entire state of receive
window ((in addition to first sequence
q number of
window)
TCP Congestion Control (Simon S. Lam) 7

7
Sliding Windows for Lossy FIFO Channels
 A small
ll number
b off bi
bits iin packet
k h header
d f for
sequence number
 Necessary and sufficient condition for correct
operation: SW + RW ≤ MaxSeqNum
 Necessity: RW receive window size
SW send window size
Source: send window
P1
Sender
0 1 2 a–1 a
acknowledged unacknowledged

Sink:
next expected
0 1 2
P2
Receiver delivered receive window

TCP Congestion Control (Simon S. Lam) 8

8
Sliding Windows for Lossy FIFO
Ch
Channels
l
 Sufficiency
y can only
y be  Interesting
g special
p cases
demonstrated by using a  SW = RW = 1
formal method to prove alternating-bit
that the protocol protocoll
provides reliable in-
 SW = 7, RW = 1
order delivery. See
Sh k and
Shankar dLLam, ACM out of order arrivals
out-of-order
TOPLAS, Vol. 14, No. 3, not accepted, e.g.,
July
y 1992. HDLC
 SW = RW

TCP Congestion Control (Simon S. Lam) 9

9
Sliding
Sl d ng W
Windows
ndows for LRD Channels
 Assumption: Packets have bounded lifetime L
 Be careful how fast sequence numbers are
consumed
m (i.e.,
( , byy arrival of
f data to be sent
into network)
(send rate)× L < MaxSeqNum
 TCP
 32-bit sequence numbers
 counts bytes
 assumes that datagrams will be discarded by IP
if too old

TCP Congestion Control (Simon S. Lam) 10

10
Window Size Controls Sending
W g Rate

RTT

Source 1 2 W 1 2 W
time

data ACKs

Destination 1 2 W 1 2 W
time

 ~ W packets per RTT when no loss

TCP Congestion Control (Simon S. Lam) 11

11
Throughput
 Limit the number of unacked transmitted
packets
k ts iin th
the network
t k tto window
i d si
size W

W
 Max.
M throughput
th h t  packets/sec
k t /
RTT
W × MSS
= RTT bytes/sec
(assuming no loss, MSS denotes maximum segment size)

 Where did we apply Little’s Law?


Answer : Consider the TCP send buffer

TCP Congestion Control (Simon S. Lam) 12

12
Throughput or send rate?
 Previous formula actually provides an upper bound
 Average
g number
m in the send buffer
ff is less than W unless
packet arrival rate to send buffer is infinite
 If a packet is lost in the network with probability p, then
g time in send buffer
the average ff is (1 − p ) × RTT + p × TO
Since TO > RTT, actual throughput is smaller.

 The
Th throughput
th hp t of
f a host
h st’ss TCP send
s nd buffer
b ff is th
the
host’s send rate into the network (including
original transmissions and retransmissions)

TCP Congestion Control (Simon S. Lam) 13

13
Fast Retransmit
 Time-out period often  If sender receives 3
relatively long:
long duplicate ACKs for
long delay before
the same data, it

resending lost packet
 Detect lost segments supposes that
via duplicate ACKs segment after
 Sender often sends ACKed data was
many segments back-to- lost:
back
 If segment is lost,  fast retransmit:
there
h will
ll likely
l k l be many resend se
segment
ment
duplicate ACKs. before timer expires

TCP Congestion Control (Simon S. Lam) 14

14
Host A Host B

egment
eout forr 2nd se
time

time
Resending
R di a segmentt afterft ttriple
i l dduplicate
li t ACK
without waiting for timeout
TCP Congestion Control (Simon S. Lam) 15

15
TCP F
Flow Control
flow control receiver: explicitly informs
sender won’t overrun y
sender of (dynamically y
receiver’s buffers by changing) amount of
transmitting too much, free buffer space
too fast  RcvWindow field in
TCP segment header

sender keeps amount of


sender:
transmitted, unACKed
data less than most
y received
recently
RcvWindow value

buffer at receive side of a TCP connection

TCP Congestion Control (Simon S. Lam) 16

16
Causes/costs of congestion: scenario
 four senders
Q: what happens as λ and λ'in
 multi-hop paths in
increase at many
 Timeout & retransmit
senders? positive feedback
Host A  instability
λin : original
g data
λ'in : original data plus
retransmitted data
finite shared output
li k b
link buffers
ff

Host B λout

TCP Congestion Control (Simon S. Lam) 17

17
Effect of Congest
Congestion
on
 W too big for many flows -> congestion
 Packet loss -> transmissions on links a packet has
traversed prior to loss are wasted
 Congestion collapse due to too many retransmissions
and too much wasted transmission capacity
 October 1986, Internet had its first congestion
collapse
goodput

upper bound
desired
collapse
load (aggregate send rate)
TCP Congestion Control (Simon S. Lam) 18

18
TCP Window
W ndow Control

 Receiver flow control


 Avoid overloading receiver
 rcvwindow: receiver’s advertised window (also rwnd)
 Receiver sends rcvwindow to sender

 Network congestion control


 Sender tries to avoid overloading network
 It infers network congestion from “loss indications”
 congwin: congestion window (also cwnd)

 Sender sets W = min (congwin, rcvwindow)

TCP Congestion Control (Simon S. Lam) 19

19
TCP Congestion Control
 end-to-end control (no network How does sender
assistance) determine CongWin?
 sender limits transmission:  loss event = timeout or
LastByteSent-LastByteAcked 3 duplicate acks
≤ CongWin  TCP sender reduces
 Roughly,
Roughly the send buffer’s
buffer s CongWin after a loss
event
three mechanisms:
CongWin
throughput
h h ≤
RTT
/
bytes/sec  slow
l start
 reduce to 1 segment
where CongWin is in bytes after timeout event
 AIMD (additive
( ddi i iincrease
multiplicative decrease)

Note: For now consider RcvWindow to be very large such that the send window size is
equall to
t CongWin.
C Wi

TCP Congestion Control (Simon S. Lam) 20

20
TCP Slow Start
 Probing for usable bandwidth

 When connection begins,


begins CongWin = 1 MSS
 Example: MSS = 500 bytes & RTT = 200 msec
 initial rate = 2500 bytes/sec
y = 20 kbps
p

 available bandwidth may y be >> MSS/RTT


 desirable to quickly ramp up to a higher rate

TCP Congestion Control (Simon S. Lam) 21

21
TCP Slow Start (more)
 When connection
Host A Host B
begins, increase rate
exponentially until

RTT
first loss event or
“threshold”
 double
d bl CongWini every
RTT
 done by incrementing
CongWin by 1 MSS for
every ACK received
 Summary: initial rate
is slow but ramps up
exponentially fast time

TCP Congestion Control (Simon S. Lam) 22

22
Congestion avoidance state &
responses to loss events 3 dup ACKs
Q: If no loss, when should

w size
14
the exponential increase 12
TCP
switch to linear? Reno

n window
10
A: When CongWin gets to

egments)
current value of 8

threshold 6

congestion
(se
threshold
Implementation:
4

2 TCP
 For initial slow start, Tahoe
0
threshold
h h ld is set to a large
l 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value (e.g., 64 Kbytes)
 Subsequently, threshold is Transmission round
variable
bl Tahoe Reno
 At a loss event, threshold is
set to 1/2 of CongWin just Note: For simplicity,
p y CongWin
g is in number
b f
before lloss event of segments in the above graph.
TCP Congestion Control (Simon S. Lam) 23

23
Rationale f
for Reno’s F
Fast Recovery
y
 After 3 dup ACKs:

 3 dup ACKs indicate  CongWin


g is cut in half
f
(multiplicative decrease)
network capable of
 window then grows linearly
delivering some segments (additive
(add t ve increase)
ncrease)
 But after timeout event:
 timeout occurring  CongWin is set to 1 MSS

before 3 dup ACKs is instead;


“more alarming”  window then grows
exponentially to threshold,
then grows linearly

Additive Increase Multiplicative Decrease (AIMD)

TCP Congestion Control (Simon S. Lam) 24

24
TCP Reno ((example
mp scenario))
CongWin Timeout

3 dupACKs

halved

threshold
th sh ld reached
h d
Initial slow start
during tslow start
In this example, 3 dupACKs during slow
f
start before reachingg initial threshold

TCP Congestion Control (Simon S. Lam)


24

25
Example: FR/FR entry and ex
Example exitt

S 1 2 3 4 5 6 7 8 1 9 10 11
time
1 1 1 1 1 1 1 9 Exit FR/FR
loss
R time

cwnd 8 7 9 11 4 deflate cwnd


ssthresh 4 4 4 4
 Above scenario: Packet 1 is lost, packets 2, 3, and
4 are received; 3 dupACKs with seq. no. 1 returned
 Fast retransmit
 Retransmit packet 1 upon 3 dupACKs
 Fast recovery y (in
( steps)
p )
 Inflate cwnd with #dupACKs such that new packets 9,
10, and 11 can be sent while repairing loss
TCP Congestion Control (Simon S. Lam) 26

26
FR/FR ((in
n more deta
detail)
l)
 Enter FR/FR after 3 dupACKsp
 Set ssthresh ← max(flightsize/2, 2)
 Retransmit lost packet
 Set cwnd ← ssthresh + #dupACKs (window inflation)
 Wait till W=min(rwnd, cwnd) is large enough; transmit
new packet(s)
 On
O non-dup
d ACK (1 RTT later),
l t ) set
s t cwnd
d ← ssthresh
ssth sh
(window deflation)
 Enter Congestion Avoidance

TCP Congestion Control (Simon S. Lam) 27

27
Summary: TCP Congestion Control (Reno)
Summary
 When CongWin is below Threshold, sender in
slow-start phase,
phase window grows exponentially (until
loss event or exceeding threshold).

 When CongWin is above Threshold, sender is in


congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs,
occurs Threshold
set to CongWin/2 and CongWin set to Threshold
(also fast retransmit)
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.

TCP Congestion Control (Simon S. Lam) 28

28
Successive
Success ve Timeouts
T meouts
 When there is another timeout, double the timeout
value
 Keep doing so for each additional loss-retransmission
 Exponential backoff up to
max timeout value equal
to 64 times initial timeout
value

(There are other variations.)

Note: red line in figure denotes first timeout


TCP Congestion Control (Simon S. Lam) 29

29
AIMD in steady state (when no timeout)
additive increase: multiplicative decrease:
increase CongWin by cut CongWin in half
1 MSS every RTT in after loss event (3 dup
the absence of any acks)
loss event: probing
congestion
window What limits the average
24 Kbytes window size (or throughput)?

16 Kbytes

8 Kbytes

time
Long-lived TCP connection
TCP Congestion Control (Simon S. Lam) 30

30
First approximation
M. Mathis, et al., “The Macroscopic Behavior of the TCP Congestion
Avoidance Algorithm,”ACM Computer Communicatons Review, 27(3), 1997.
 No slow-start, no timeout, long-lived TCP
connection
c nn cti n
 Independent identically distributed “periods”
 Three dupACKs are received in a round with
probability p
Ave.

# of RTTs
TCP Congestion Control (Simon S. Lam) 31

31
Geometric Distribution

Independent
d d trials
l - a triall fails
f l with
h probability
b bl p
Ave. no. of transmissions to get first “failure”
∞ ∞
n =  ibb =  i (1 − p)
i =1
i
i =1
i −1
p

= p  i (1 − p )i −1
i =1

d ∞ d ∞
= −p 
d i =1
dp
(1
( − p ) i
= − p
dp
d
 ((1 − p)
i =0
i

d 1 1
= −p =p 2
dp 1 − 1 + p p
= 1/ p

Ave no
Ave. no. of trials to get first “success”
success is
1/(1-p)
TCP Congestion Control (Simon S. Lam) 32

32
First
F rst approx
approximation
mat on (cont.)
 Average number of send rate (in packets/sec)
packets delivered in
3 2
one period (area under W
no. of packets/period
one saw-tooth) = = 8
time per period W 
2 2 RTT  
W  1 W  3 2  2
  +   = W
 2  2 2  8 =
1/ p
=
1 3
 2  RTT 2p
 Average number of RTT  
packets sent per period is  3p 
1/p
 Equate the two and solve
f W,
for W we gett 8
W =
3p TCP Congestion Control (Simon S. Lam) 33

33
TCP ACK generation
generat on [RFC 1122,, RFC 2581]
58 ]

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

Arrival of in-order segment with Immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

Arrival of out-of-order segment Immediately send duplicate ACK,


higher-than-expect
higher than expect seq
seq. # . indicating seq. # of next expected byte
Gap detected

Arrival of segment that Immediate send ACK, provided that


partially
ti ll or completely
l t l fill
fills gap segmentt starts
t t att lower
l end
d off gap

TCP Congestion Control (Simon S. Lam) 34

34
Receiver implements
mp m Delayed
D y ACKsK
 Receiver sends one ACK for every two packets
received -> each saw-tooth is WxRTT wide
-> area under
d a saw-tooth h is 3W 2 1
=
4 p

 Send rate is
1/ p 1/ p 1 3
= =
RTT ⋅ W RTT ⋅ 4 / ((3 p ) RTT 4p
 One ACK for every b packets received -> send rate
is
1 3
RTT 2bp
p

TCP Congestion Control (Simon S. Lam) 35

35
Challenges
g in the future
f
 TCP average throughput (approximate) in terms of
loss rate, p
1.22 ⋅ MSS for b = 1
RTT p
 Example: 1500-byte segments, 100ms RTT, to get
10 Gbps throughput,
throughput loss rate needs to be very low
p = 2x10-10

 New versions of TCP needed for connections with


large delay-bandwidth product
 E g data center networks (local,
E.g., (local global)

TCP Congestion Control (Simon S. Lam) 36

36
A more detailed model

Reference:
J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP
Th
Throughput:
h t A Simple
Si l Model
M d l and
d its
it E
Empirical
i i lVValidation,”
lid ti ”
Proceedings ACM SIGCOMM, 1998.

TCP Congestion Control (Simon S. Lam) 37

37
Mot vat on
Motivation
 Previous formulas not so accurate when
loss rates are high

 TCP traces show that there are more loss


indications due to timeouts ((TO)) than due
to triple dupACKs (TD)

TCP Congestion Control (Simon S. Lam) 38

38
AIMD with Timeouts

triple
i l ddup acks
k

 No slow start

 b = 1 (no delayed ack)

TCP Congestion Control (Simon S. Lam) 39

39
Problem 3 in HW #2
Simplified:
 no triple duplicate Acks
 packet loss (timeout) with probability p
 timeout interval fixed at T0 after each
loss

First success in
next cycle

TCP Congestion Control (Simon S. Lam) 40

40
The End

TCP Congestion Control (Simon S. Lam) 41

41

You might also like