IRI - Chap3
IRI - Chap3
IRI - Chap3
Networking
Guy Leduc
3-1
Transport layer: overview
Our goal:
§ understand principles § learn about Internet transport
behind transport layer layer protocols:
services: • UDP: connectionless transport
• multiplexing, • TCP: connection-oriented reliable
demultiplexing transport
• reliable data transfer • TCP congestion control
• flow control
• congestion control
lo g
ica
l en
d -e
§ two transport protocols available to
nd
Internet applications local or
tra
regional ISP
n sp
• TCP, UDP
o rt
home network content
provider
network datacenter
application
network
transport
network
data link
physical
enterprise
network
Sender:
application § is passed an application- application
app. msg
layer message
transport § determines segment TThhtransport
app. msg
header fields values
network (IP) § creates segment network (IP)
physical physical
Receiver:
application § receives segment from IP application
§ checks header values
app. msg
transport § extracts application-layer transport
message
network (IP) network (IP)
§ demultiplexes message up
link to application via socket link
physical physical
Th app. msg
lo g
• congestion control
ica
• flow control
l en
d -e
• connection setup
nd
local or
§ UDP: User Datagram Protocol
tra
regional ISP
n sp
• unreliable, unordered delivery
o rt
home network content
provider
• no-frills extension of “best-effort” IP network datacenter
application
network
Hnnetwork
Ht HTTP msg transport
transport
network link network
link physical link
physical physical
transport
Hn Ht HTTP msg
transport
application
Three TCP segments, all destined to IP address B and dest port 80 are demultiplexed to
different sockets by using source IP and source port #
© From Computer Networking, by Kurose&Ross Transport Layer: 3-20
Summary
§ Multiplexing, demultiplexing: based on segment and packet
header field values
§ UDP: demultiplexing using destination IP address and
destination port number (only)
§ TCP: demultiplexing using 4-tuple: source and destination IP
addresses, and source and destination port numbers
§ Some form of multiplexing/demultiplexing happen at all layers:
§ For example, in the network layer, demultiplexing is used to deliver
the IP packet payload either to TCP or UDP
application application
transport transport
(UDP) (UDP)
link link
physical physical
data to/from
UDP segment format application layer
Transmitted: 5 6 11
Received: 4 6 11
receiver-computed
checksum
= sender-computed
checksum (as received)
sum 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Note: when adding numbers, a carryout from the most significant bit needs to be
added to the result (this is one’s complement addition)
sending receiving
process process
application data data
transport
reliable channel
transport
network
unreliable channel
sending receiving
process process
application data data
transport
sender-side of receiver-side
Complexity of reliable data reliable data
transfer protocol
of reliable data
transfer protocol
transfer protocol will depend
(strongly) on characteristics of transport
network
unreliable channel (lose, unreliable channel
corrupt, reorder data?)
reliable service implementation
sending receiving
process process
application data data
transport
sender-side of receiver-side
Sender, receiver do not know reliable data
transfer protocol
of reliable data
transfer protocol
the “state” of each other, e.g.,
was a message received? transport
network
§ unless communicated via a unreliable channel
message
reliable service implementation
unreliable channel
udt_send(): called by rdt rdt_rcv(): called when packet
to transfer packet over Bi-directional communication over arrives on receiver side of
unreliable channel to receiver unreliable channel channel
© From Computer Networking, by Kurose&Ross Transport Layer: 3-40
Reliable data transfer: getting started
We will:
§ incrementally develop sender, receiver sides of reliable data transfer
protocol (rdt)
§ consider only unidirectional data transfer
• but control info will flow in both directions!
§ use finite state machines (FSM) to specify sender, receiver
event causing state transition
actions taken on state transition
state: when in this “state”
next state uniquely state state
determined by next 1 event
event 2
actions
Note: such race conditions are only possible over full-duplex channels
3-56
rdt3.2: adding seq # in ACKs
receiver must specify seq # of pkt being ACKed
pkt(0)
A
A received
pkt(0) ACK(0)
Timeout
ACK(0) Discard (no dupl)
B Resend ACK
B not pkt(1) X
ACKed
pkt(1)
Timeout
ACK(1) B recovered
3-57
rdt3.2 sender
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
start_timer
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
pkt(0)
Timeout
resend pkt(0) pkt(0) ACK(0) pkt reordering
This delay can be
arbitrarily small
pkt(1)
ACK(1)
ACK(0) Duplicate!
pkt(0) X But considered
pkt(0) as new pkt(0)!
ACKed
but lost!
Solution: Choose timeout so large that when a pkt is retransmitted the sender is sure that
the previous copy of this pkt and its ACK have disappeared from the network.
Better solution: Use a much larger seq# space (see later).
Transport Layer 3-64
Performance of rdt3.2/3.3 (stop-and-wait)
§ Usender: utilization – fraction of time sender busy sending
L/R L/R
Usender=
RTT + L / R
.008 RTT
=
30.008
= 0.00027
U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008
Receiver: Receiver:
Sender: Sender:
Receiver: Receiver:
Sender:
Receiver:
pkt 2 timeout
012345678 send pkt2
012345678 (but not 3,4,5)
012345678 rcv pkt2; send ack2;
012345678 deliver pkt2, pkt3, pkt4, pkt5;
a dilemma!
0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 0123012
example:
0123012
0123012 pkt3
X
0123012
§ seq #s: 0, 1, 2, 3 (base 4 counting) pkt0 will accept packet
with seq number 0
§ window size=3 (a) no problem
0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 X 0123012
X 0123012
X
timeout
retransmit pkt0
0123012 pkt0
will accept packet
with seq number 0
(b) oops!
© From Computer Networking, by Kurose&Ross Transport Layer: 3-85
Selective repeat:
sender window receiver window
(after receipt) (after receipt)
a dilemma!
0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 0123012
example:
0123012
0123012 pkt3
X
§ seq #s: 0, 1, 2, 3 (base 4 counting) § receiver can’t
0123012
pkt0 will accept packet
see sender side with seq number 0
§ window size=3 (a) no problem
§ receiver
behavior
identical in both
cases!
§0something’s
123012 pkt0
Q: what relationship is needed 0(very)
1 2 3 0 1wrong!
2 pkt1 0123012
Receiver 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
W=[1..3]
pkt3
W=[0]
Duplicate!
but considered
as new pkt(0)!
Solution: Choose timeout so large that when a pkt is retransmitted the sender is sure that
the previous copy of this pkt and its ACK have disappeared from the network.
Better solution: Use a huge seq# space (K >>), keep N much smaller than K-1, and rely on
the underlying network to ensure packets and ACKs do not live too long Transport Layer 3-89
SR, when pkt or ACK reordering possible
K = 4, N = 2 W=[0..1] W=[0..1]
pkt0
pkt1
W=[1..2]
W=[2..3]
Ack0 pkt reordering
resend pkt0
W=[1..2]
pkt2
W=[3..0]
Duplicate!
but considered
as new pkt(0)!
Solution: Choose timeout so large that when a pkt is retransmitted the sender is sure that
the previous copy of this pkt and its ACK have disappeared from the network.
Better solution: Use a huge seq# space (K >>), keep N much smaller than K/2, and rely on
the underlying network to ensure packets and ACKs do not live too long Transport Layer 3-90
Chapter 3: roadmap
§ Transport-layer services
§ Multiplexing and demultiplexing
§ Connectionless transport: UDP
§ Principles of reliable data transfer
§ Connection-oriented transport: TCP
• segment structure
• reliable data transfer
• flow control
• connection management
§ TCP congestion control
window size
Acknowledgements: N
User types‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs receipt
of‘C’, echoes back ‘C’
Seq=79, ACK=43, data = ‘C’
host ACKs receipt
of echoed ‘C’
Seq=43, ACK=80
SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
timeout
ACK=100
X
ACK=100
ACK=120
SendBase=120
=100
timeout
ACK
=100
ACK
=100
Receipt of three duplicate ACKs ACK
TCP
code
Network layer
delivering IP packet
payload into TCP
IP
socket buffers
code
from sender
TCP
code
receive window
flow control: # bytes
receiver willing to accept IP
code
from sender
TCP
flow control code
3-109
Nagle algorithm
§ When TCP receives data from socket one Host A Host B
byte at the time, or more generally in units
much smaller than the TCP Maximum a few bytes
Segment Size (MSS) small packet with few
a few bytes bytes received
• send small packet immediately, and buffer all the
rest until the outstanding bytes are acknowledged a few bytes Naggle does not allow
• send other segments only when all previous bytes to send other bytes
are acked (or when MSS bytes have been buffered, a few bytes (unless MSS is reached)
or when previous segment was full size) a few bytes
§ Useful for Telnet for example
Otherwise:
• 41-byte segments containing 1 byte of data packet with all buffered bytes
• 40 bytes of TCP/IP header overhead
§ Naggle can be disabled if needed
• See Socket options: TCP_NoDelay
3-110
Silly window syndrome
Still a problem with
Receiver’s buffer is full
applications reading one
byte at a time Application reads 1 byte
application application
network network
choose x
req_conn(x) No problem!
ESTAB
acc_conn(x)
ESTAB
data(x+1) accept
ACK(x+1) data(x+1)
connection
x completes
choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)
ESTAB
req_conn(x)
connection
client x completes server
terminates forgets x
choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)
ESTAB
data(x+1) accept
data(x+1)
retransmit
data(x+1)
connection
x completes server
client
terminates forgets x
req_conn(x)
ESTAB
data(x+1)
Problem: duplicate data
accept
data(x+1) accepted!
ACK(y+1) choose y’, must not have been used in recent past!
and data(x+1) (Here “recent” means a maximum packet life time)
Reject ACK: y ≠ y’
SYNACK(y’, x+1)
Socket connectionSocket =
welcomeSocket.accept();
L
SYN(x) Socket clientSocket =
SYNACK(seq=y,ACKnum=x+1) new Socket("hostname","port number");
create new socket for listen SYN(seq=x)
communication back to client
SYN SYN
rcvd sent
SYNACK(seq=y,ACKnum=x+1)
ESTAB ACK(ACKnum=y+1)
ACK(ACKnum=y+1)
L
FIN
WAIT FIN
ACK 1
ACK simultaneous CLOSE
L WAIT
close
FIN FIN+ACK
WAIT CLOSING connectionSocket.close()
2 ACK All pending data are
still sent reliably FIN
FIN ACK LAST
ACK TIME L ACK After a socket “close”,
WAIT
ACK TCP continues to send
Wait 2 * MSL
close socket the previous bytes reliably
close socket CLOSED
Side that actively closes socket waits a double MSL (safety margin). Therefore same TCP 4-tuple cannot be reused
sooner (e.g. with same client port#). It ensures that no duplicate segment from an earlier connection with the same
4-tuple can jump in the current connection. Transport Layer 3-124
Chapter 3: roadmap
§ Transport-layer services
§ Multiplexing and demultiplexing
§ Connectionless transport: UDP
§ Principles of reliable data transfer
§ Connection-oriented transport: TCP
§ TCP congestion control
§ Evolution of transport-layer
functionality
AIMD sawtooth
behavior: probing
for bandwidth
RTT
• initially cwnd = 1 MSS
(Maximum Segment Size) 2 two segme
nts
• double cwnd every RTT
• done by incrementing cwnd 3 four segm
for every ACK received 4 ents
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
3W/4 . W/2
MSS per
W/2 cycle
W/2 . RTT Transport Layer: 3-135
TCP goodput in steady state (2)
§ Average window size (in MSS) = 3W/4
§ Number of MSS per cycle = 3W/4 . W/2 = 3W2/8
§ If we assume one packet loss per cycle, we have p = 1 / (3W2/8)
• where p is the packet loss ratio
§ So
bottleneck
TCP connection 2 router
capacity R
Connection 1 throughput R
© From Computer Networking, by Kurose&Ross Transport Layer: 3-139
And when RTTs are different?
§ If RTT of connection 2 = 2 x RTT of connection 1
§ Connection 1 ramps up twice more quickly
TCP goodput:
Connection 1 throughput R
Network IP IP
TCP handshake
(transport layer) QUIC handshake
data
TLS handshake
(security)
data
GET GET
HTTP
GET QUIC QUIC QUIC QUIC QUIC QUIC
encrypt encrypt encrypt encrypt encrypt
encrypt
QUIC QUIC QUIC
TLS encryption TLS encryption RDT RDT RDT error! QUIC
QUIC QUIC
RDT RDT RDT
(a) HTTP 1.1 and HTTP/2 with TCP and TLS (b) HTTP/3: HTTP/2 with QUIC: no HOL blocking