TCP Refresher - Part 1
TCP Refresher - Part 1
TCP Refresher - Part 1
Part-1
Disclaimer
• This Presentation contains a mixture of self made slides and material
already out on the Internet.
• The goal of this presentation/talk is to provide TCP refresher.
• At the end you wont know everything about TCP.
• The goal here is to introduce you to the Mountains by climbing on top
of a hill. How to climb the Mountains is an exercise left for the
audience/reader .
• My hope is that you will get at least something new out of this but if
you already knew everything then at least you got the free Lunch .
Problem Statement
• So assume that we have to develop a protocol which is responsible for
data transfer over a medium that is lossy. This protocol needs to
provide mechanism to transfer data reliably.
• If an error occurs during the data transfer there are primarily two ways
it can be corrected:
• Error Correction Code (FEC): Shannon theory forms the base and mostly used
in low level protocols like Optical, Memory chips etc..
• Automatic Repeat Request (ARQ): Basically brute force method where you try
to send the data again. TCP uses this form to provide reliability.
Reliability and Error Recovery
• General ARQ Algorithms
• Stop & Wait
• Sliding Window Protocols
• Go-Back-N
• Selective Repeat
4
Error Recovery: Stop and Wait
• ARQ
– Receiver sends acknowledgement (ACK)
when it receives packet
– Sender waits for ACK and timeouts if it Sender Receiver
does not arrive within some time period
• Simplest ARQ protocol
Timeout
• Send a packet, stop and wait
until ACK arrives
Time
Recovering from Error
Timeout
Timeout
Timeout
Time
Timeout
Timeout
Timeout
ACK lost Packet lost Early timeout
DUPLICATE DUPLICATE
PACKETS!!! PACKETS!!!
Problems with Stop and Wait
• How do I recognize a duplicate packet ? (Generic Problem)
• Performance issues:
• Can only send one packet per round trip
• Question: How long the sender should wait for an ACK ??
• This turns out to be a hard problem to solve. Will look later.
How to Recognize Duplicates?
• Use sequence numbers
• both packets and acks
• Sequence # in packet is finite How big
should it be?
• For stop and wait?
• One bit – won’t send seq #1 until
received ACK for seq #0
Got Reliability but what about efficiency?
• So we found a way to achieve reliability in our protocol but sending a
packet and then waiting for an ACK isn't very efficient.
• If a packet is lost, we have to send that data again which brings our
“GoodPut” further down..
Got Reliability but what about efficiency?
Sender Receiver
first packet bit
transmitted, t = 0
• Can’t keep the pipe full: Utilization is low when bandwidth-delay product (R x RTT)is large!.
R is the link capacity and RTT is the round-trip
• What is BDP ?: It’s a measure of capacity of the network in bits. Or in other words, amount
of data on the network at any given time, i.e. data that has been transmitted but not yet
acknowledged.
So how can we make it efficient ?
• By letting sender sending more packets in the network at a time without waiting for
first to be acked.
• Number of pkts in flight = window
So how can we make it efficient ?
Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged data
segments
• range of sequence numbers must be increased
• buffering at sender and/or receiver
Now things are getting complicated, isn’t it ?
• Sender Side Complexity:
• Sender needs to decide when to inject packets into the network ?
• How many packets do I need to inject into the network ?
• Need to keep the timers when waiting for ACKs and keep a copy of the un-acked
packets in case retransmission is needed.
• Receiver Side Complexity:
• It needs to have a logic for ACK to distinguish which packets have been received
and which have not.
• Needs a buffer which allows to hold “out-of-sequence” packets (due to re-
ordering), unless it wants to throw those packets which will be very inefficient
(remember our goal ?, that efficiency thing we were talking about)
• Sender and Receiver speed mismatch (Flow control)
• Network can be overwhelmed by the sender (Congestion control)
Introducing Windows of Packet(Sliding Windows)
• Define a Window of packets that have been injected but not yet
ACK’d. (window size is the number of packets unack’d)
• We will slide the window forward as the packets get acknowledged
(Sliding Window).
Introducing Windows of Packet(Sliding Windows)
• Sliding window will be kept as a data structure on both Sender and
Receiver.
• It allows the sender to keep track on what packets can be released,
packets waiting for ACKs, and packets which cannot yet be sent.
• It allows the receiver to keep track of packets already received and
ACK’d.
• This structure looks good and promising but where is the guideline
for:
• How large the window should be ?
• What if the receiver can not handle the sender data rate ?
We also need to worry about Flow Control
• Flow Control is the problem which arises when there is a mismatch
between Sender and Receiver Data rate.
• Window based flow control is used to solve the flow control problem.
• In this approach Window size isn’t fixed and varies over time.
• In order to make this work a Receiver needs to signal back the Sender
on how large a Window to use. This is known as Window Update.
• Technically an ACK sent by a receiver is different then Window Update
but in practice it piggy backs on the ACK packets.
• As we can see that this approach clearly allows the sender to control
the number of packets which can be injected by controlling the
window size.
Flow Control
• Assume that the sender is allowed to inject “W” packets into the
network before it hears an ACK for any of them.
• Assume both Sender and Receiver are sufficiently fast and the
network has no Loss with infinite capacity.
• Then the Throughput = (SW/R) where W is the window size, S is the
packet size in bits, and R is the RTT.
• By controlling the size of “W” we can control the sender throughput.
• Now time to throw another monkey wrench:
• What about the network carrying traffic ? It is possible for the Sender to inject
packets which can exceed routers ability causing packet loss.
• This problem is called Congestion Control
Congestion Control
• This problem involves the sender slowing down as to not overwhelm
the network between Sender and Receiver
• MSS option is carried within the SYN segment. Default is 536 bytes.
Typical value we see is 1460 bytes
TCP Options: Window Scale
• Without Window Scaling, max amount of data which can be sent is
216 = 65535 bytes (65KB).
• WS option effectively increases the capacity of the TCP window
advertisement field from 16 to 30 bits.
• Window Scale option applies a scaling factor the 16 bit Window
advertisement value.
• A shift count of 0 indicates no scaling and maximum scale value is 14
which provides (65535 x 214 ), effectively 1GB.
• This option can only appear in SYN segments, so the scale factor is
fixed in each direction when the connection is established.
• Sample PCAP
TCP Options: Timestamp Options
• Timestamp option allows the sender place two 4-byte timestamp
values in every segment.
• Receiver reflects these values in the ACK, allowing the sender to
calculate an estimate of the connection’s RTT for each ACK received.
• Sender puts a 32 bit value field call TSV or Tsval and the receiver
echoes this back unchanged in the second Timestamp Echo Retry
(TSER)
• This option serves two purpose:
• Better estimation of RTT which goes as a factor for RTO. (We will later dig
deeper)
• It also helps in Protection against Wrapped Sequence Numbers (PAWS)
• Sample PCAP
TCP Timeout (RTO) and Retransmission
• One problem we mentioned earlier was that how long a Sender
should wait before concluding that the packet is lost (Sent packet or
ACK)
• A Time out needs to be at least somewhere around = Time to send
the packet to Receiver+ Time for the ACK to travel back
• Problem is that both factors aren’t constant and can vary in the
network and we cant rely on the users to tell for every circumstances.
• The Protocol tries to (guess-)estimate by itself and is known as Round
Trip Estimation.
• It’s a statistical process and is close to sample mean of a collection of
samples of RTTs. (Note: The avg. changes over time as the network
condition changes)
TCP Round Trip Time and Timeout
Q: Challenges with setting TCP timeout Q: how to estimate RTT?
value? • SampleRTT: measured time from
• It has to be longer than RTT segment transmission until ACK receipt
• but RTT varies so it must adapt • ignore retransmissions, why?
• Importance of accurate RTT estimators: • SampleRTT will vary, want estimated
RTT “smoother”
• If its set too short:
• premature timeout which will cause unnecessary • average several recent
retransmissions measurements, not just current
• If its set too long: SampleRTT
• slow reaction to segment loss
Adaptive Retransmission(Original Algorithm)
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleR TT
SampleR TT
• ACK is for Original transmission but was for retransmission => Sample RTT is too large
• ACK is for retransmission but was for original => Sample RTT too small
Karn/Partridge Algorithm(Simple Proposal)
• Sender sends a packet, TCP sets up the timer (Standard Algo or Timestamps) and if the ACK isn't
received within that time then it triggers a retransmit of the packet.
• Retransmission Timeout are considered really bad within TCP world (Will look later on why, for now
just take the word)
• The basic idea is that a receiver get’s 3 DUP ACK’s then send the packet immediately and not wait
for the RTO which allows to react faster on lost packets.
• We will still have Retransmission Timeout (RTO) but it will act as a back up mechanism.
Fast Retransmit (Ex:1)
Ignore cwnd at this moment. We will look
into that during congestion avoidance
Assumption:
• Only one packet in a window is dropped. In this
diagram, it is Pkt1.
• TCP at the sending side implements fast
retransmit algorithm.
• The retransmitted packet is completely received
at the receiver.
All packets in transit are not enough to
trigger fast retransmit.
Fast Retransmit (Ex:2)
Assumption:
• Only one packet in a window is
dropped. In this diagram, it is Pkt5.
• TCP at the sending side implements
fast retransmit algorithm.
• The retransmitted packet is
completely received at the receiver
In the example, packets in transit are
enough to trigger fast retransmit.
Therefore, after receiving the third
duplicate ACK5, TCP at the sending
side retransmits Pkt5
Fast Retransmit(Ex:3)
Assumption:
• Two packets in a window are
dropped. In this diagram, they
are Pkt7 and Pkt9.
• TCP at the sending side
implements fast retransmit
algorithm.
• The retransmitted packet is
completely received at the
receiver.
Retransmission of Pkt9 happens
because Pkt9's RTO is expired - not fast
retransmit. Why? As shown that there
are no enough packets in transit for
triggering the 2nd fast retransmit.
Fast Retransmit - Recap
• So we saw how Fast Retransmit helps in recovering quickly in the case of a single
packet loss but not in the case of multiple packet loss.
Lets talks about Flow Control
TCP Traffic Control
• Traffic control
• There are two reasons for sender to reduce the rate of sending packets.
• When receiver’s buffer space is not enough, flow control
• When the network is congested, congestion control
network
congestion
Small-capacity
receiver
Flow Control
• The idea here is to not let the Sender overwhelm with more data then
what a receiver can handle.
• The way this can be achieved is by Receiver telling the Sender whats its
buffer size (Capability to receive is) which gives the sender an idea how
much it can send.
• This is achieved by maintaining a Window Structure on Sender and
Receiver side.
Sliding Window Flow Control
Segments sent, but
Segments that can be sent
not acknowledged
0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3
Receiver tells the Sender its
window size which is
The last segment Window is shrinking Window expands
tracked at the Sender side
That was acked as the segments are received as acks are sent
as a variable “awnd”
(b) receiver’s window (Advertised Window)
Window Flow Control
RTT
Source 1 2 W 1 2 W
time
data ACKs
Destination 1 2 W 1 2 W
time
W MSS
• Source rate = bps
RTT
• If W too small then rate « capacity
If W too big then rate > capacity => congestion
• This doesn't take packet loss into account, Matthew Mathis formula
fixes it.
TCP Persist Timer
• Background
• When the TCP receiver advertises window = 0, the TCP sender stops sending
temporarily. Afterwards, the receiver lets the sender know it can receive
segments again by sending new window advertisement. But if this new
window advertisement is lost (since this doesn't contain any data, its not
delivered reliably), the sender will wait for the new advertisement forever.
(Deadlock!!)
• Solution
• After the sender knows window=0, the sender transmits window probe
segment periodically to check out if the receiver is ready to accept. The
window probe is sent according to the persist timer.
• Window probe is a segment of 1 byte length.
• TCP allows sender to transmit one byte even if the receiver’s window is closed.
• TCP persist timer is increasing exponentially.
TCP Persist Timer
win=0
window probe
win=0
ACK(win=0)
win=256 window probe
lost
ACK(win=0)
ACK(win=0)
Persist Timer
(normal TCP Exponential backoff)
Delayed Acknowledgement
Delayed acknowledgments for reducing number of segments: The receiver does not send Acks immediately after the
receipt of an (error-free) packet but waits up to ~200ms/500ms if a packet is in the send buffer (depends on host OS). If so
it, piggybacks the ACK onto the transmit packet; if no transmit packet is available the receiver sends an Ack latest after
~200ms/500ms
Questions ?