Congestion Management and Buffering WP
Congestion Management and Buffering WP
Congestion Management and Buffering WP
WHITE PAPER
Congestion Management
and Buffering in Data
Center Networks
Abstract
TABLE OF CONTENTS
Introduction
technology and business trends are driving this transformation. Server and storage
virtualization is changing the way applications, servers, and storage are deployed.
10G/40G Ethernet, Data Center Bridging (DCB), and Low Latency Ethernet are
opening up the possibility of an all Ethernet data center that moves away from
Congestion Management in
Data Center Networks
Growing East-West traffic patterns within a data center is driving flatter network
architectures that are optimized for fewer hops, lower latency and mesh-type
Summary 9
Bibliography
10
connectivity. Within this changing landscape, the issue of traffic management and
congestion management within the data center takes on importance as traditional
approaches to congestion management such as adding arbitrarily large buffers in
the network switches may potentially be detrimental to key IT requirements such
as performance, latency and responsiveness of applications, as well as significant
additional cost.
B = C * RTT
where B = buffer size for the link, C = data rate of the link, and
RTT = Average Round Trip Time.
However, research1 suggests that while this metric may hold true
in the case of a single long-lived flow, most network switches
and routers typically serve a much larger number of concurrent
flows. The mix of flows includes short-lived as well as long-lived
flows. In such an environment the research paper shows that a
link with n flows requires buffer sizing no more than:
B = (C * RTT) / n
These networks have very low Round Trip Times (RTTs) due
the network can help smooth this out and improve application
becoming important.
Additionally when dealing with large, complex data sets that are
constantly changing, as well as in environments such as online
transaction processing (OLTP), there is greater emphasis on
network.
means is that even with the shared buffer pool, the queue
drop thresholds for each queue are not statically assigned.
BUFFER SIZING
being replicated across five ports on the same switch silicon. All
ports in the simulation are 40GbE ports using Quad Small Form-
3.072MB+74880B=3.145MB.
While the above calculation is an example of buffer sizing
required to absorb a temporary burst, several other factors come
into play. Simulations (see Figure 2 above) have shown that for a
switch with 64 10GbE ports, using a uniform random distribution
across all ports, a burst size of 32KB, packet size of 1500 bytes,
a loading factor of 80 percent on all ports and a target frame
loss rate of 0.1 percent (to guarantee high TCP goodput), a
packet buffer of 5.52 MB was shown to be effective in achieving
the target frame loss rate2. The results are based on adaptive
dynamic sharing of buffers (i.e. not a static per-port allocation).
Extreme Networks Summit X670 TOR switch provides 9MB
of smart packet buffer across 48, 10GbE and 4, 40GbE ports.
Similarly, the Extreme Networks Black Diamond X8 utilizes
shared smart buffering technology both on its I/O modules
as well as on its fabric modules or example, the 24-port 40G
module uses four packet processing silicon chips, each of
which provides 9MB (72Mb) of smart packet buffer. A small
number of buffers are reserved on a per-port/per-queue basis,
which is configurable. The rest of the buffers are available as
a shared pool that can be allocated dynamically across ports/
queues. As described earlier, adaptive discard thresholds can
be configured to limit the maximum number of buffers that can
be consumed by a single queue. The threshold adjusts itself
dynamically to congestion conditions to maximize the burst
absorption capability while still providing fairness. As shown in
the simulation results, the 9MB (72Mb) of shared dynamic smart
CONGESTION MANAGEMENT
Right sizing buffers in the network along with the right buffer
The first is that the source can adapt its transmission rate
to match the network conditions. For example, TCP has
sophisticated congestion management capabilities which
rely on congestion feedback from the network either
implicitly (duplicate ACKs or timeout) or explicitly (through
AQM such as RED-ECN).
The second is that the larger buffers at the end station can
be utilized for buffering the sources traffic rather than in
the network so that the traffic management and buffering
is at the granularity of the sources flow rather than at an
Notification).
artifact of the fact that this number was optimized for WAN
out of the single copy to each of the egress ports and possibly
While smart buffering technology with FADT has broad applicability, the above
examples are representative of how smart buffering technology addresses some of
the most stringent requirements in very specific markets.
Summary
The subject of network switch buffering is a complex one with a lot of research
ongoing on this subject. Traditional approaches of over-buffering in network
switches may actually be a cause of performance degradation, where performance
Bibliography
1. Sizing Router Buffers. G. Appenzeller, I. Keslassy and N. McKeown. 2004,
Proceedings of the SIGCOMM.
2. Bufferbloat: Dark Buffers in the Internet. J. Gettys, K. Nichols. 2012,
Communications of the ACM, Vol 55 No. 1, pp. 57-65.
3. Dynamic queue length thresholds for shared-memory packet switches. A. K.
Choudhury, E.L. Hahne. s.l. : ACM Transactions on Networking, 6, 1998.
4. Das, Sujal and Sankar, Rochan. Broadcom Smart-Buffer Technology in Data
Center Switches for Cost Effective Performance Scaling of Cloud Applications.
[Online] 2012. http://www.broadcom.com/collateral/etp/SBT-ETP100.pdf.
5. Short and Fat: TCP Performance in CEE Datacenter Networks. Daniel Crisan,
Andreea Simona Anghel, Robert Birke, Cyriel Minkenberg and Mitch Gusat. s.l. : Hot
Interconnects, 2011.
6. Lippis, Nick. Lippis Report: Open Industry Network Performance and Power Test
for Cloud Networks - Evaluating 10/40GbE switches Fall 2011 Edition. 2011.
7. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Comm. Vijay
Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David G. Andersen, Gregory
R. Ganger, Garth A. Gibson, Brian Mueller. s.l. : SIGCOMM, 2009.
8. Data Center TCP (DCTCP). Mohammad Alizadeh, Albert Greenberg, David Maltz,
Jitu Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta , Murari Sridha. s.l. :
SIGCOMM, 2010.
http://www.ExtremeNetworks.com/contact
Phone +1-408-579-2800
2014 Extreme Networks, Inc. All rights reserved. Extreme Networks and the Extreme Networks logo are trademarks or registered trademarks of Extreme Networks, Inc.
in the United States and/or other countries. All other names are the property of their respective owners. For additional information on Extreme Networks Trademarks
please see http://www.extremenetworks.com/about-extreme/trademarks.aspx. Specifications and product availability are subject to change without notice. 1856-0612
WWW.EXTREMENETWORKS.COM
10