TCP Flow Control and Congestion Control: EECS 489 Computer Networks Z. Morley Mao Monday Feb 5, 2007

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 55

TCP

Flow Control and Congestion Control

EECS 489 Computer Networks


http://www.eecs.umich.edu/courses/eecs489/w07
Z. Morley Mao
Monday Feb 5, 2007

Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoica Mao W07 1
TCP Flow Control
flow control
sender won’t overflow
 receive side of TCP receiver’s buffer by
connection has a receive
transmitting too
buffer:
much,
too fast

 speed-matching service:
matching the send rate to the
receiving app’s drain rate

app process may be slow at


reading from buffer

Mao W07 2
TCP Flow control: how it works
 Rcvr advertises spare room
by including value of
RcvWindow in segments
 Sender limits unACKed data
to RcvWindow
- guarantees receive buffer
(Suppose TCP receiver discards doesn’t overflow
out-of-order segments)
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]

Mao W07 3
TCP Connection Management

Recall: TCP sender, receiver Three way handshake:


establish “connection” before
exchanging data segments Step 1: client host sends TCP
SYN segment to server
 initialize TCP variables:
- specifies initial seq #
- seq. #s
- no data
- buffers, flow control info
(e.g. RcvWindow) Step 2: server host receives SYN,
 client: connection initiator replies with SYNACK segment
Socket clientSocket = new - server allocates buffers
Socket("hostname","port
- specifies server initial seq. #
number");
Step 3: client receives SYNACK,
 server: contacted by client replies with ACK segment,
Socket connectionSocket = which may contain data
welcomeSocket.accept();

Mao W07 4
TCP Connection Management (cont.)

Closing a connection: client server

client closes socket: close


clientSocket.close(); FIN

Step 1: client end system


sends TCP FIN control ACK
segment to server close
FIN
Step 2: server receives FIN,
replies with ACK. Closes
connection, sends FIN.

timed wait
ACK

closed

Mao W07 5
TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK.
closing
- Enters “timed wait” - will FIN
respond with ACK to
received FINs
Step 4: server, receives ACK. ACK
Connection closed. closing
FIN
Note: with small modification,
can handle simultaneous
FINs.

timed wait
ACK

closed

closed

Mao W07 6
TCP Connection Management
(cont)

TCP server
lifecycle

TCP client
lifecycle

Mao W07 7
Principles of Congestion Control

Congestion:
 informally: “too many sources sending too much data too
fast for network to handle”
 different from flow control!
 manifestations:
- lost packets (buffer overflow at routers)
- long delays (queueing in router buffers)
 a top-10 problem!

Mao W07 8
Causes/costs of congestion:
scenario 1
Host A
in : original data out
 two senders, two
receivers
 one router, infinite Host B unlimited shared

buffers output link buffers

 no retransmission

 large delays when


congested
 maximum
achievable
throughput

Mao W07 9
Causes/costs of congestion:
scenario 2
 one router, finite buffers
 sender retransmission of lost packet

Host A in : original data out

'in : original data, plus


retransmitted data

Host B finite shared output


link buffers

Mao W07 10
Causes/costs of congestion:
scenario 2
 always:  = out(goodput)
in
 “perfect” retransmission only when loss:  > out
in
 retransmission of delayed (not lost) packet makes  larger (than
in
perfect case) for same 
out
R/2 R/2 R/2

R/3
out

out
out

R/4

R/2 R/2 R/2


in in in

a. b. c.
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Mao W07 11
Causes/costs of congestion:
scenario 3
four senders


Q: what happens as
 multihop paths in
 timeout/retransmit
and  increase ?
in
Host A out
in : original data
'in : original data, plus
retransmitted data
finite shared output
link buffers

Host B

Mao W07 12
Causes/costs of congestion:
scenario 3
H 
o
s o
t
u
A
t

H
o
s
t
B

Another “cost” of congestion:


when packet dropped, any “upstream transmission capacity
used for that packet was wasted!

Mao W07 13
Approaches towards congestion
control
Two broad approaches towards congestion control:

End-end congestion control: Network-assisted congestion


 no explicit feedback from control:
network  routers provide feedback to
 congestion inferred from end- end systems
system observed loss, delay - single bit indicating
 approach taken by TCP congestion (SNA, DECbit,
TCP/IP ECN, ATM)
- explicit rate sender
should send at

Mao W07 14
Case study: ATM ABR congestion
control
ABR: available bit rate: RM (resource management)
 “elastic service” cells:
 if sender’s path  sent by sender, interspersed with
“underloaded”: data cells
- sender should use  bits in RM cell set by switches
available bandwidth (“network-assisted”)
 if sender’s path congested:
- NI bit: no increase in rate
- sender throttled to (mild congestion)
minimum guaranteed
rate - CI bit: congestion indication
 RM cells returned to sender by
receiver, with bits intact

Mao W07 15
Case study: ATM ABR congestion
control

 two-byte ER (explicit rate) field in RM cell


- congested switch may lower ER value in cell
- sender’ send rate thus minimum supportable rate on path
 EFCI bit in data cells: set to 1 in congested switch
- if data cell preceding RM cell has EFCI set, sender sets CI bit in
returned RM cell

Mao W07 16
TCP Congestion Control

 end-end control (no network assistance) How does sender perceive


 sender limits transmission: congestion?
 loss event = timeout or 3
LastByteSent-LastByteAcked
 CongWin
duplicate acks
 TCP sender reduces rate
 Roughly,
(CongWin) after loss event
three mechanisms:
- AIMD
 CongWin is dynamic, function of - slow start
perceived network congestion
- conservative after timeout
events
CongWin
rate = Bytes/sec
RTT

Mao W07 17
TCP AIMD
multiplicative decrease: additive increase: increase
cut CongWin in half CongWin by 1 MSS every
after loss event RTT in the absence of loss
events: probing
c o n g e s tio n
w in d o w

2 4 K b y te s

1 6 K b y te s

8 K b y te s

tim e

Long-lived TCP connection


Mao W07 18
TCP Slow Start

 When connection begins, When connection begins,


CongWin = 1 MSS increase rate exponentially fast
- Example: MSS = 500 bytes & until first loss event
RTT = 200 msec
- initial rate = 20 kbps
 available bandwidth may be >>
MSS/RTT
- desirable to quickly ramp up to
respectable rate

Mao W07 19
TCP Slow Start (more)

 When connection begins, Host A Host B


increase rate exponentially
until first loss event: one segm
ent

RTT
- double CongWin every
RTT
two segm
- done by incrementing ents
CongWin for every ACK
received
 Summary: initial rate is four segm
ents
slow but ramps up
exponentially fast

time

Mao W07 20
Refinement
Philosophy:
 After 3 dup ACKs:
- CongWin is cut in half • 3 dup ACKs indicates
- window then grows linearly network capable of
 But after timeout event: delivering some segments
- CongWin instead set to 1 MSS;
• timeout before 3 dup
- window then grows exponentially
ACKs is “more alarming”
- to a threshold, then grows linearly

Mao W07 21
Refinement (more)

Q: When should the


exponential increase
switch to linear?
A: When CongWin gets
to 1/2 of its value
before timeout.

Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event

Mao W07 22
Summary: TCP Congestion
Control
 When CongWin is below Threshold, sender in slow-
start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold set
to CongWin/2 and CongWin set to Threshold.
 When timeout occurs, Threshold set to CongWin/2
and CongWin is set to 1 MSS.

Mao W07 23
TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt for Slow Start CongWin = CongWin + MSS, Resulting in a doubling of
previously (SS) If (CongWin > Threshold) CongWin every RTT
unacked data set state to “Congestion
Avoidance”
ACK receipt for Congestion CongWin = CongWin+MSS * Additive increase, resulting
previously Avoidance (MSS/CongWin) in increase of CongWin by
unacked data (CA) 1 MSS every RTT
Loss event SS or CA Threshold = CongWin/2, Fast recovery, implementing
detected by CongWin = Threshold, multiplicative decrease.
triple duplicate Set state to “Congestion CongWin will not drop
ACK Avoidance” below 1 MSS.
Timeout SS or CA Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
Duplicate ACK SS or CA Increment duplicate ACK count CongWin and Threshold not
for segment being acked changed

Mao W07 24
TCP throughput

 What’s the average throughout of TCP as a


function of window size and RTT?
- Ignore slow start
 Let W be the window size when loss occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2, throughput
to W/2RTT.
 Average throughout: .75 W/RTT

Mao W07 25
TCP Futures

 Example: 1500 byte segments, 100ms RTT, want


10 Gbps throughput
 Requires window size W = 83,333 in-flight
segments
 Throughput in terms of loss rate:
1.22  MSS
RTT L
 ➜ L = 2·10-10 Wow
 New versions of TCP for high-speed needed!

Mao W07 26
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

Mao W07 27
Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally

R equal bandwidth share


Connection 2 throughput

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R

Mao W07 28
Fairness (more)
Fairness and UDP Fairness and parallel TCP
connections
 Multimedia apps often do not
 nothing prevents app from
use TCP
opening parallel cnctions between
- do not want rate throttled by
congestion control 2 hosts.
 Web browsers do this
 Instead use UDP:
 Example: link of rate R supporting
- pump audio/video at
constant rate, tolerate 9 cnctions;
packet loss - new app asks for 1 TCP, gets rate
 Research area: TCP friendly R/10
- new app asks for 11 TCPs, gets
R/2 !

Mao W07 29
Delay modeling
Notation, assumptions:
Q: How long does it take to receive  Assume one link between client
an object from a Web server after and server of rate R
sending a request?  S: MSS (bits)
Ignoring congestion, delay is  O: object size (bits)
influenced by:  no retransmissions (no loss, no
corruption)
 TCP connection establishment
Window size:
 data transmission delay
 First assume: fixed congestion
 slow start window, W segments
 Then dynamic window,
modeling slow start

Mao W07 30
TCP Delay Modeling:
Slow Start (1)
Now suppose window grows according to slow start

Will show that the delay for one object is:

O  S S
Latency  2 RTT   P  RTT    ( 2  1)
P

R  R R

where P is the number of times TCP idles at server:

P  min{Q , K  1}

- where Q is the number of times the server idles


if the object were of infinite size.

- and K is the number of windows that cover the object.


Mao W07 31
TCP Delay Modeling:
Slow Start (2)
Delay components: in itia te T C P

• 2 RTT for connection


c o n n e c tio n

estab and request re q u e s t


• O/R to transmit o b je c t
f ir s t w in d o w
object = S /R

• time server idles due RTT


s e c o n d w in d o
to slow start = 2 S /R

Server idles: t h ir d w in d o w
= 4 S /R
P = min{K-1,Q} times
Example:
• O/S = 15 segments fo u r th w in d o w
• K = 4 windows = 8 S /R

•Q=2
• P = min{K-1,Q} = 2
c o m p le te
Server idles P=2 times
o b je c t t r a n s m is s io n
d e liv e r e d
tim e a t
tim e a t s e rv e r
c lie n t Mao W07 32
TCP Delay Modeling (3)
S
 RTT  time from when server starts to send segment
R
until server receives acknowledgement icn oi t ni ant ee c Tt iCo nP
S
2k 1  time to transmit the kth window re q u e s t
R o b je c t
f ir s t w in d o w
= S /R

S k 1 S 
RTT

 R  RTT  2  idle time after the kth window s e c o n d w in d o w

R 
= 2 S /R

t h ir d w in d o w
= 4 S /R

P
O
delay   2 RTT   idleTime p f o u r t h w in d o w
= 8 S /R
R p 1
P
O S S
  2 RTT   [  RTT  2 k 1 ]
R k 1 R R o b je c t
c o m p le t e
tr a n s m is s io n
d e liv e r e d
O S S
  2 RTT  P[ RTT  ]  (2 P  1) tim e a t
R R R tim e a t
c lie n t
s e rv e r

Mao W07 33
TCP Delay Modeling (4)
Recall K = number of windows that cover object

How do we calculate K ?
K  min{k : 2 0 S  21 S    2 k 1 S  O}
 min{k : 2 0  21    2 k 1  O / S}
O
 min{k : 2  1  }
k

S
O
 min{k : k  log 2 (  1)}
S
 O 
 log 2 (  1)
 S 

Calculation of Q, number of idles for infinite-size object,


is similar (see HW).

Mao W07 34
HTTP Modeling
 Assume Web page consists of:
- 1 base HTML page (of size O bits)
- M images (each of size O bits)
 Non-persistent HTTP:
- M+1 TCP connections in series
- Response time = (M+1)O/R + (M+1)2RTT + sum of idle times
 Persistent HTTP:
- 2 RTT to request and receive base HTML file
- 1 RTT to request and receive M images
- Response time = (M+1)O/R + 3RTT + sum of idle times
 Non-persistent HTTP with X parallel connections
- Suppose M/X integer.
- 1 TCP connection for base file
- M/X sets of parallel connections for images.
- Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times
Mao W07 35
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and X=5
20
18
16
14
non-persistent
12
10
persistent
8
6
parallel non-
4
persistent
2
0
28 100 1 Mbps 10
Kbps Kbps Mbps
For low bandwidth, connection & response time dominated by
transmission time.
Persistent connections only give minor improvement over parallel
connections. Mao W07 36
HTTP Response time (in seconds)
RTT =1 sec, O = 5 Kbytes, M=10 and X=5
70
60
50
non-persistent
40
30 persistent

20 parallel non-
10 persistent
0
28 100 1 Mbps 10
Kbps Kbps Mbps
For larger RTT, response time dominated by TCP establishment
& slow start delays. Persistent connections now give important
improvement: particularly in high delaybandwidth networks. Mao W07 37
Issues to Think About

 What about short flows? (setting initial cwnd)


- most flows are short
- most bytes are in long flows

 How does this work over wireless links?


- packet reordering fools fast retransmit
- loss not always congestion related

 High speeds?
- to reach 10gbps, packet losses occur every 90 minutes!

 Fairness: how do flows with different RTTs share link?

Mao W07 38
Security issues with TCP

 Example attacks:
- Sequence number spoofing
- Routing attacks
- Source address spoofing
- Authentication attacks

Mao W07 39
Network Layer

goals:
 understand principles behind network layer
services:
- routing (path selection)
- dealing with scale
- how a router works
- advanced topics: IPv6, mobility
 instantiation and implementation in the Internet

Mao W07 40
Network layer
 transport segment from sending to receiving host application
transport
 on sending side encapsulates segments into network
datagrams data link network
 on rcving side, delivers segments to transport physical data link
network network
layer data link physical data link
physical physical
 network layer protocols in every host, router
 Router examines header fields in all IP network
datagrams passing through it data link
physical network
data link
physical

network
network data link
data link physical
physical
network
data link application
physical transport
network
data link
physical

Mao W07 41
Key Network-Layer Functions

 forwarding: move packets analogy:


from router’s input to
appropriate router output  routing: process of
planning trip from source to
 routing: determine route dest
taken by packets from
source to dest.  forwarding: process of
getting through single
- Routing algorithms interchange

Mao W07 42
Interplay between routing and forwarding

routing algorithm

local forwarding table


header value output link
0100 3
0101 2
0111 2
1001 1

value in arriving
packet’s header
0111 1

3 2

Mao W07 43
Connection setup

 3rd important function in some network


architectures:
- ATM, frame relay, X.25
 Before datagrams flow, two hosts and intervening
routers establish virtual connection
- Routers get involved
 Network and transport layer cnctn service:
- Network: between two hosts
- Transport: between two processes

Mao W07 44
Network service model
Q: What service model for “channel” transporting
datagrams from sender to rcvr?
Example services for individual Example services for a flow of
datagrams: datagrams:
 guaranteed delivery  In-order datagram delivery

 Guaranteed delivery with less  Guaranteed minimum


than 40 msec delay bandwidth to flow
 Restrictions on changes in
inter-packet spacing

Mao W07 45
Network layer service models:
Guarantees ?
Network Service Congestion
Architecture Model Bandwidth Loss Order Timing feedback

Internet best effort none no no no no (inferred


via loss)
ATM CBR constant yes yes yes no
rate congestion
ATM VBR guaranteed yes yes yes no
rate congestion
ATM ABR guaranteed no yes no yes
minimum
ATM UBR none no yes no no

Mao W07 46
Network layer connection and
connection-less service

 Datagram network provides network-layer


connectionless service
 VC network provides network-layer connection
service
 Analogous to the transport-layer services, but:
- Service: host-to-host
- No choice: network provides one or the other
- Implementation: in the core

Mao W07 47
Virtual circuits
“source-to-dest path behaves much like telephone circuit”
- performance-wise
- network actions along source-to-dest path

 call setup, teardown for each call before data can flow
 each packet carries VC identifier (not destination host address)
 every router on source-dest path maintains “state” for each
passing connection
 link, router resources (bandwidth, buffers) may be allocated to
VC

Mao W07 48
VC implementation

A VC consists of:
1. Path from source to destination
2. VC numbers, one number for each link along path
3. Entries in forwarding tables in routers along path
 Packet belonging to VC carries a VC number.
 VC number must be changed on each link.
- New VC number comes from forwarding table

Mao W07 49
Forwarding table VC number

12 22 32

1 3
2

Forwarding table in interface


number
northwest router:
Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 2 22
2 63 1 18
3 7 2 17
1 97 3 87
… … … …

Routers maintain connection state information!


Mao W07 50
Virtual circuits: signaling protocols

 used to setup, maintain teardown VC


 used in ATM, frame-relay, X.25
 not used in today’s Internet

application
6. Receive data application
transport 5. Data flow begins
network 4. Call connected 3. Accept call transport
data link 1. Initiate call 2. incoming call network
data link
physical
physical

Mao W07 51
Datagram networks
 no call setup at network layer
 routers: no state about end-to-end connections
- no network-level concept of “connection”
 packets forwarded using destination host address
- packets between same source-dest pair may take different paths

application
application
transport
transport
network
data link 1. Send data 2. Receive data network
data link
physical
physical

Mao W07 52
4 billion
Forwarding table possible entries

Destination Address Range Link Interface

11001000 00010111 00010000 00000000


through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


through 2
11001000 00010111 00011111 11111111

otherwise 3

Mao W07 53
Longest prefix matching

Prefix Match Link Interface


11001000 00010111 00010 0
11001000 00010111 00011000 1
11001000 00010111 00011 2
otherwise 3

Examples

DA: 11001000 00010111 00010110 10100001 Which interface?

DA: 11001000 00010111 00011000 10101010 Which interface?

Mao W07 54
Datagram or VC network: why?
Internet ATM
 data exchange among computers  evolved from telephony
- “elastic” service, no strict  human conversation:
timing req.
- strict timing, reliability
 “smart” end systems (computers) requirements
- can adapt, perform control, - need for guaranteed service
error recovery
 “dumb” end systems
- simple inside network,
complexity at “edge” - telephones
 many link types - complexity inside network
- different characteristics
- uniform service difficult

Mao W07 55

You might also like