MULTITERABITNETWORKS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Multiterabit Networks

1. Introduction to Next-Generation Multiterabit Networks.

The explosive demand for bandwidth for data networking applications continues
to drive photonics technology toward ever increasing capacity in the backbone fiber
network and toward flexible optical networking. Already commercial Tb/s (per fiber)
transmission systems have been announced, and it can be expected that in the next
several years, we will begin to be limited by the 50 THz transmission bandwidth of silca
optical fiber. Efficient bandwidth utilization will be one of the challenges of photonics
research. Since the communication will be dominated by data, we can expect the network
of the future to consist of multiterabit packet switches to aggregate traffic at the edge of
the network and cross connects with wavelength granularity and tens of terabits
throughout the core.

The infrastructure required to govern Internet traffic volume, which doubles every
six months, consists of two complementary elements: fast point-to-point links and high-
capacity switches and routers. Dense wavelength division multiplexing (DWDM)
technology, which permits transmission of several wave-lengths over the same optical
media, will enable optical point-to-point links to achieve an estimated 10 terabits per
second by 2008. However, the rapid growth of Internet traffic coupled with the avail-
ability of fast optical links threatens to cause a bottleneck at the switches and routers.

Multiterabit packet-switched networks will require high-performance scheduling


algorithms and architectures. With port densities and data rates growing at an
unprecedented rate, future prioritized scheduling schemes will be necessary to
pragmatically scale toward multiterabit capacities. Further, support of strict QoS
requirements for the diverse traffic loads characterizing emerging multimedia Internet
traffic will increase. Continuous improvements in VLSI and optical technologies will
stimulate innovative solutions to the intricate packet-scheduling task.
Multiterabit Networks

2. DWDM
2.1 Options for Increasing Carrier Bandwidth
Faced with the challenge of dramatically increasing capacity while constraining
costs, carriers have two options: Install new fiber or increase the effective bandwidth of
existing fiber. Laying new fiber is the traditional means used by carriers to expand their
networks. Deploying new fiber, however, is a costly proposition. It is estimated at about
$70,000 per mile, most of which is the cost of permits and construction rather than the
fiber itself. Laying new fiber may make sense only when it is desirable to expand the
embedded base. Increasing the effective capacity of existing fiber can be accomplished in
two ways:
o Increase the bit rate of existing systems.
o Increase the number of wavelengths on a fiber.
2.2 Increase the Bit Rate
Using TDM, data is now routinely transmitted at 2.5 Gbps (OC-48) and,
increasingly, at 10 Gbps (OC-192); recent advances have resulted in speeds of 40 Gbps
(OC-768). The electronic circuitry that makes this possible, however, is complex and
costly, both to purchase and to maintain. In addition, there are significant technical issues
that may restrict the applicability of this approach. Transmission at OC-192 over single-
mode (SM) fiber, for example, is 16 times more affected by chromatic dispersion than the
next lower aggregate speed, OC-48. The greater transmission power required by the
higher bit rates also introduces nonlinear effects that can affect waveform quality.
Finally, polarization mode dispersion, another effect that limits the distance a light pulse
can travel without degradation.
2.3 Increase the Number of Wavelengths
In this approach, many wavelengths are combined onto a single fiber. Using
wavelength division multiplexing (WDM) technology several wavelengths, or light
colors, can simultaneously multiplex signals of 2.5 to 40 Gbps each over a strand of fiber.
Without having to lay new fiber, the effective capacity of existing fiber plant can
routinely be increased by a factor of 16 or 32. Systems with 128 and 160 wavelengths are
in operation today, with higher density on the horizon. The specific limits of this
technology are not yet known.
Multiterabit Networks

2.4 Wavelength Division Multiplexing


WDM increases the carrying capacity of the physical medium (fiber) using a
completely different method from TDM. WDM assigns incoming optical signals to
specific frequencies of light (wavelengths, or lambdas) within a certain frequency band.
Another way to think about WDM is that each channel is a different color of light;
several channels then make up a "rainbow.” In a WDM system, each of the wavelengths
is launched into the fiber, and the signals are demultiplexed at the receiving end. Like
TDM, the resulting capacity is an aggregate of the input signals, but WDM carries each
input signal independently of the others. This means that each channel has its own
dedicated bandwidth; all signals arrive at the same time, rather than being broken up and
carried in time slots. The difference between WDM and dense wavelength division
multiplexing (DWDM) is fundamentally one of only degree. DWDM spaces the
wavelengths more closely than does WDM, and therefore has a greater overall capacity.
The limits of this spacing are not precisely known, and have probably not been reached,
though systems are available in mid-year 2000 with a capacity of 128 lambdas on one
fiber. DWDM has a number of other notable features, which are discussed in greater
detail in the following chapters. These include the ability to amplify all the wavelengths
at once without first converting them to electrical signals, and the ability to carry signals
of different speeds and types simultaneously and transparently over the fiber (protocol
and bit rate independence).
Note: WDM and DWDM use single-mode fiber to carry multiple lightwaves of differing
frequencies. This should not be confused with transmission over multimode fiber, in
which light is launched into the fiber at different angles, resulting in different "modes" of
light. A single wavelength is used in multimode transmission.

2.5 TDM and WDM Compared


SONET TDM takes synchronous and asynchronous signals and multiplexes them
to a single higher bit rate for transmission at a single wavelength over fiber. Source
signals may have to be converted from electrical to optical or from optical to electrical
and back to optical before being multiplexed. WDM takes multiple optical signals, maps
them to individual wavelengths, and multiplexes the wavelengths over a single fiber.
Multiterabit Networks

Another fundamental difference between the two technologies is that WDM can carry
multiple protocols without a common signal format, while SONET cannot.
2.6 Why DWDM?
From both technical and economic perspectives, the ability to provide potentially
unlimited transmission capacity is the most obvious advantage of DWDM technology.
The current investment in fiber plant can not only be preserved, but optimized by a factor
of at least 32. As demands change, more capacity can be added, either by simple
equipment upgrades or by increasing the number of lambdas on the fiber, without
expensive upgrades. Capacity can be obtained for the cost of the equipment, and existing
fiber plant investment is retained. Bandwidth aside, DWDM's most compelling technical
advantages can be summarized as follows:
Transparency—Because DWDM is a physical layer architecture, it can transparently
support both TDM and data formats such as ATM, Gigabit Ethernet, and Fibre Channel
with open interfaces over a common physical layer.
Scalability—DWDM can leverage the abundance of dark fiber in many metropolitan area
and enterprise networks to quickly meet demand for capacity on point-to-point links.
Dynamic provisioning—Fast, simple, and dynamic provisioning of network connections
give providers the ability to provide high-bandwidth services in days rather than months.

3. THE ARCHITECTURE OF INTERNET ROUTERS


This section gives a general introduction about the architecture of routers and the
functions of its various components. This is very important for understanding about the
bottlenecks in achieving high speed routing and how are these handled in the design of
gigabit and even terabit capacity routers available today in the market.
3.1 Routing Principles
The principal criterion of successful routing is, of course, correctness, but it is not
the only criterion. You might prefer to take the most direct route (the one that takes the
least time and uses the least fuel), the most reliable route (the one that is not likely to be
closed by a heavy snowfall), the most scenic route (the one that follows pleasant country
roads rather than busy highways), the least expensive route (the one that follows freeways
rather than toll roads), or the safest route (the one that avoids the army's missile testing
Multiterabit Networks

range). In its most general form, optimal routing involves forwarding a packet from
source to destination using the "best" path.
3.2 Requirements
These observations suggest that an open systems routing architecture should:
1. Scale well
2. Support many different subnetwork types and multiple qualities of service
Adapt to topology changes quickly and efficiently (i.e., with minimum overhead and
complexity)
3. Provide controls that facilitate the "safe" connection of multiple Organizations
It is not likely that the manual administration of static routing tables (the earliest
medium for the maintenance of internetwork routes, in which a complete set of fixed
routes from each system to every other system was periodically-often no more frequently
than once a week -loaded into a file on each system) will satisfy these objectives for a
network connecting more than a few hundred systems. A routing scheme for a large-scale
open systems network must be dynamic, adaptive, and decentralized; be capable of
supporting multiple paths offering different types of service; and provide the means to
establish trust, firewalls, and security across multiple administrations (see ISO/IEC TR
9575, the OSI Routing Framework).
3.3 OSI Routing Architecture
The architecture of routing in OSI is basically the same as the architecture of
routing in other connectionless (datagram) networks, including TCP/IP. As usual,
however, the conceptual framework and terminology of OSI are more highly elaborated
than those of its roughly equivalent peers, and thus, it is the OSI routing architecture that
gets the lion's share of attention here. Keep in mind that most of what is said about the
OSI routing architecture applies to hop-by-hop connectionless open systems routing in
general. The OSI routing scheme consists of: A set of routing protocols that allow end
systems and intermediate systems to collect and distribute the information necessary to
determine routes
* A routing information base containing this information, from which routes between end
systems can be computed. (Like a directory information base, the routing information
base is an abstraction; it doesn't exist as a single entity. The routing information base can
Multiterabit Networks

be thought of as the collective (distributed) wisdom of an entire subsystem concerning


the routing-relevant connectivity among the components of that subsystem.)
* A routing algorithm that uses the information contained in the routing information base
to derive routes between end systems
* End systems (ESs) and intermediate systems (ISs) use routing protocols to distribute
("advertise") some or all of the information stored in their locally maintained routing
information base.
ESs and ISs send and receive these routing updates, and use the information that they
contain (and information that may be available from the local environment, such as
information entered manually by an operator) to modify their routing information base.
The routing information base consists of a table of entries that identify a destination (e.g.,
a network service access point address); the subnetwork over which packets should be
forwarded to reach that destination (also known as the next hop, or "next hop subnetwork
point of attachment address"); and some form of routing metric, which expresses one or
more of the characteristics of the route (its delay properties, for example, or its expected
error rate) in terms that can be used to evaluate the suitability of this route, compared to
another route with different properties, for conveying a particular packet of class of
packets. The routing information base may contain information about more than one
"next hop" to the same destination if it is important to be able to send packets over
different paths depending on the way in which the "quality of service" specified in the
packet's header corresponds to different values of the routing metric(s). The routing
algorithm uses the information contained in the routing information base to compute the
actual routes ("next hops"); these are collectively referred to as the forwarding
information base. It is important to recognize that the routing information base is
involved in computations that take place in the "background," independent of the data
traffic flowing between sources and destinations at any given moment; but the forwarding
information base is involved in the real-time selection of an outgoing link for every
packet that arrives on an incoming link and must therefore be implemented in such a way
that it does not become a performance-killing bottleneck in a real-world intermediate
system (router). No system-certainly not an end system, which is supposed to be devoted
primarily to tasks other than routing-can maintain a routing information base containing
Multiterabit Networks

all the information necessary to specify routes from any "here" to any "there" in the entire
global Internet. Neither is it possible to design a single routing protocol that operates well
both in local environments (in which it is important to account quickly for changes in the
local network topology) and in wide area environments (in which it is important to limit
the percentage of network bandwidth that is consumed by "overhead" traffic such as
routing updates).
3.4 Router Functions
Functions of a router can be broadly classified into two main categories:
1. Datapath Functions : These functions are applied to every datagram that reaches the
router and successfully routed without being dropped at any stage.
Main functions included in this category are the forwarding decision, forwarding through
the backplane and output link scheduling.
2. Control Functions : These functions include mainly system configuration,
management and update of routing table information. These do not apply to every
datagram and therefore performed relatively infrequently.
Goal in designing high speed routers is to increase the rate at which datagrams are routed
and therefore datapath functions are the ones to be improved to enhance the performance.
Here we discuss briefly about the major datapath functions .
* The Forwarding Decision : Routing table search is done for each arriving datagram
and based on the destination address, output port is determined. Also, a next-hop MAC
address is appended to the front of the datagram, the time-to-live(TTL) field of the IP
datagram header is decremented, and a new header checksum is calculated.
* Forwarding through the backplane : Backplane refers to the physical path between the
input port and the output port. Once the forwarding decision is made, the datagram is
queued before it can be transferred to the output port across the backplane. If there is not
enough space in the queues, then it might even be dropped.
* Output Link Scheduling : Once a datagram reaches the output port, it is again queued
before it can be transmitted on the output link. In most traditional routers, a single FIFO
queue is maintained. But most advanced routers maintain separate queues for different
flows, or priority classes and then carefully schedule the departure time of each datagram
in order to meet various delay and throughput guarantees.
Multiterabit Networks

3.5 Evolution of Present Day Routers


The architecture of earliest routers was based on that of a computer as shown in
Fig 1 . It has a shared central bus, central CPU, memory and the Line cards for input and
output ports. Line cards provide MAC-layer functionality and connect to the external
links. Each incoming packet is transferred to the CPU across the shared bus. Forwarding
decision is made there and the packet then traverses the shared bus again to the output
port. Performance of these routers is limited mainly by two factors : first, processing
power of the central CPU since route table search is a highly time-consuming task and
second, the fact that every packet has to traverse twice through the shared bus.

Routing Line Card


CPU

Line Card

Buffer
Memory Line Card

Fig 1. 1st Gen. Routers


To remove the first bottleneck, some router vendors introduced parallelism by
having multiple CPUs and each CPU now handles a portion of the incoming traffic. But
still each packet has to traverse shared bus twice. Very soon, the design of router
architecture advanced one step further as shown in Fig 2 . Now a route cache and
processing power is provided at each interface and forwarding decisions are made locally
and each packet now has to traverse the shared bus only once from input port to the
output port.
Multiterabit Networks

Routing Route Cache Buffer


CPU Memory

Route Cache Buffer


Memory

Buffer Route Cache Buffer


Memory Memory

Fig 2. 2nd Gen. Routers


Even though CPU performance improved with time, it could not keep pace with
the increase in line capacity of the physical links and it is not possible to make
forwarding decisions for the millions of packets per second coming on each input link.
Therefore special purpose ASICs(Application Specific
Integrated Circuits) are now placed on each interface which outperforms a CPU in
making forwarding decisions, managing queues and arbitration access to the bus. But use
of shared bus still allowed only one packet at a time to move from input port to output
port. Finally, this last architectural bottleneck was eliminated by replacing shared bus by
a crossbar switch. Multiple line cards can communicate simultaneously with each other
now.

3.6 Assessing Router Performance


In this section, several parameters are listed which can be used to grade the
performance of new generation router architectures . These parameters reflect the
exponential traffic growth and the convergence of voice, video and data.
* High packet transfer rate: Increasing internet traffic makes the packets per second
capacity of a router as the single most important parameter for grading its performance.
Further, considering the exponential growth of traffic, the capacity of routers must be
scalable.
* Multi-service support: Most of the networks backbones support both ATM and IP
traffic and will continue to do so as both technologies have their advantages. Therefore
routers must support ATM cells, IP frames and other network traffic types in their native
modes, delivering full efficiency of the corresponding network type.
Multiterabit Networks

* Guarantee short deterministic delay : Real time voice and video traffic require short
and predictable delay through the system. Unpredictable delay results in a discontinuity
which is not acceptable for these applications.
* Quality of Service : Routers must be able to support service level agreements,
guaranteed line-rate and differential quality of service to different applications, or flows.
This quality of service support must be configurable.
* Multicast Traffic : Internet traffic is changing from predominantly point-to-point to
multicast and therefore routers must support large number of multicast transmissions
simultaneously.
*High Availability : High speed routers located in the backbones handle huge amounts of
data and can not be turned down for upgrades etc. Therefore features such as hot-
swappable software tasks- allowing in-service software upgrades are required.

3.7 An Optical Router : Lucent’s Lambda Router


Optical routers transmit traffic by using mirrors to bounce photonic signals along
a path. These optical routers are also called wavelength or lambda routers. The mirrors
used in these routers use various types of optical filters and coating to handle signals of
various wavelengths and to also change their wavelengths. Currently optical routers are
designed to work with a mesh-based optical network infrastructure with point-to-point
links between multiple locations. This allows service providers to re-route optical traffic
instantly over alternative routes if the shortest route fails or when congested. Note that
though these devices are called routers, the current optical router products still don't route
lambdas. Instead they reflect light signals across a network over a pre-determined path
using routing information typically carried over a separate out of band network. True
optical routing, in which routers can read the addresses and other information carried in
photons, is still a technology to look forward to. To switch packets in light would require
a transducer that can convert light signals, which is currently only in research. With this
view of optical routers, we find some vendors who are currently working on all-optical
routers that include Sycamore Networks, Tellium, Monterey Networks and Lucent
Technologies. We now look at Lucent Technologies WaveStar LambdaRouter combines
Multiterabit Networks

the capacity and robustness of an all-optical switch with the intelligence of a data router.
Signals that pass through the device require no conversion from optical to electrical
representation.

3. 8 Optical versus electrical

A major debate exists on whether next generation multiterabit routing scenarios


should use optical or electrical switch fabrics .At first glance, all optical switches are the
straightforward solution because they enable high-speed multiterabit aggregation and
concentration. However, emerging router functions, such as quality of service (QoS)
provisioning, require that packets be buffered typically at the network processors until the
scheduler grants transmission. Because dynamic optical buffering is impractical, delayed
Transmissions currently dictate using optoelectric and electrooptical conversions.
Switching nodes, which entail simplified buffer and transmission management, more
efficiently process network protocols deploying fixed packet sizes such as asynchronous
transfer mode (ATM).

The network processor typically segments variable length packets, such as IP


packets, into smaller, fixed size data units that traverse the switch core and later
reassembles them at the output ports for transmittal in their original format. The length of
data units passing through the switches core directly affects switch architecture and
performance. Larger data units relax the timing requirements imposed on the switching
and scheduling mechanisms, while smaller sized units offer finer switching granularity.
Designs commonly use 64 byte data units as a tradeoff.

3. 9 QoS classes

Accompanying the rapid pace of Internet traffic is the increasing popularity


Of multimedia applications sensitive to mean packet delay and jitter. To provide
differentiated services, developers categorize incoming packets into QoS classes. For
each QoS class, the router must not exceed the acclaimed latency and jitter requirements.
For routers to operate under heavy traffic loads while supporting QoS, designers must
Multiterabit Networks

implement smarter scheduling schemes, a difficult task given that routers usually
configure the crosspoint and transmit data on a per data unit basis. The scheduling
algorithm must make a new configuration decision with each incoming data unit. In an
ATM switch, where the data unit is a 53byte cell, the algorithm must issue a scheduling
decision every 168 ns at 2.5Gbitpersecond line rates. As 10 Gbit per second port rates
become standard for high-end routers, the decision time reduces fourfold, to a mere 42
ns.

4. QUEUING STRATEGIES
The switch fabric core embodies the crosspoint element responsible for matching
N input and output ports. Currently, routers incorporate electrical crosspoints but optical
crosspoint solutions, such as those based on dynamic DWDM, show promise. Regardless
of the underlying technology, the basic functionality of determining the crosspoint
configuration and transmitting data remains the same.
4.1 Input queuing
We can generally categorize switch queuing architectures as input or output
queued. Network processors store arriving packets in input queued switches in FIFO
buffers that reside at the input port until the processor signals them to traverse the
crosspoint .A disadvantage of input queuing is that a packet at the front of a queue can
prevent other packets from reaching potentially available destination or egress ports, a
phenomenon called head of line (HOL) blocking. Consequently, the overall switching
throughput degrades significantly: For uniformly distributed Bernoulli iid traffic flows,
the maximum achievable throughput using input queuing is 58 percent of the switch core
capacity. Virtual output queuing (VOQ) entirely eliminates HOL blocking. As Fig 6.
shows, every ingress port in VOQ maintains N separate queues, each associated with a
different egress port. The network processor automatically classifies and stores packets
upon arrival in a queue corresponding to their destination. VOQ thus assures that held
back packets do not block packets destined for available outputs Assigning several
Multiterabit Networks

prioritized queues instead of one queue to each egress port renders per class QoS
differentiation.

Fig 3. Input Queuing Fig 4. HOL

4.2 Output queuing

Output queuing strategies directly transfer arriving packets to their designated


egress ports. A contention resolution mechanism handles cases in which two or more
ingress ports simultaneously request packet transmission to the same egress port. A time
division multiplexing (TDM) solution accelerates the switch fabric internal transmission
rates by N with respect to the port bit rates. The growing prevalence Of 10 and 40 Bit per
second rates, however, makes acceleration by N infeasible. Trading space for time by
deploying space division multiplexing (SDM) techniques requires a dedicated path within
the switch fabric for each input output pair. However, SDM implies having O (N 2)
internal paths, only N of which can be used at any given time because at most, N input
ports simultaneously transmit to N output ports. This requirement renders SDM
impractical for large port densities such as N = 64.
Multiterabit Networks

Fig 5. Output queuing. Fig 6. VOQ

5. SCHEDULING APPROACHES
The main challenge of packet scheduling is designing fast yet clever algorithms to
determine input output matches that, at any given time:
• maximize switch throughput utilization by matching as many input output pairs as
possible,
• Minimize the mean packet delay as well as jitter,
• Minimize packet loss resulting from buffer overflow, and
• Support strict QoS requirements in accordance with diverse data classes.
Intuitively, these objectives appear contradictory. Temporarily maximizing input
output matches, for example, may not result in optimal bandwidth allocation in terms of
QoS, and vice versa. Scheduling is clearly a delicate task of assigning ingress ports to
egress ports while optimizing several performance parameters. Moreover, as port density
and bit rates Increase, the scheduling task becomes increasingly complex because more
decisions must be made during shorter time frames. Advanced scheduling schemes
exploit concurrency and distributed computation to offer a faster, more efficient decision
process.
Multiterabit Networks

5.1 PIM and RRM


Commonly deployed scheduling algorithms derive from parallel iterative matching
(PIM), an early discipline developed by the Digital Equipment Corp. for a 16port, 1Gbit
per second switch . PIM and its popular Derivatives use randomness to avoid starvation
and maximize the matching process. Unmatched inputs and outputs contend during each
time slot in a three-step process.
• Request. All unmatched inputs send requests to every output to which they have
packets to send.
• Grant. Each output randomly selects one of its requesting inputs.
• Accept. Each input randomly selects a single output from among those outputs that
granted it.
It also ensures that all requests are eventually granted. However, PIM has
significant drawbacks, principally large queuing latencies in the presence of traffic loads
exceeding 60 percent of the maximal switch capacity (calculated as number of ports
multiplied by a port data rate). Moreover, PIM’s inability to provide prioritized QoS and
its requirement for O (N 2) connectivity makes it impractical for modern switching cores.
Developers designed the round robin matching (RRM) algorithm to overcome PIM’s
disadvantages in terms of both fairness and complexity. Instead of arbitrating randomly,
RRM makes selections based on a prescribed rotating priority discipline. Two pointers
update after every “grant” and “accept.” RRM is a minor improvement over PIM, but its
overall performance remains poor under non-uniformly distributed traffic loads because
of pointer synchronization.
5.2 iSLIP
iSLIP, the widely implemented iterative algorithm developed by Stanford
University’s Nick McKeown , is a popular descendant of PIM and RRM that consists of
the following steps:
1• Request. All unmatched inputs send requests to every output to which they have
packets to send.
• Grant. Each output selects a requesting input that coincides with a predefined priority
sequence. A pointer indicates the current location of the highest priority elements and, if
accepted increments (modulo N) to one beyond the granted input.
Multiterabit Networks

• Accept. Each input selects one granting output according to a predefined priority order.
A unique pointer indicates the position of the highest priority element and
increments (Modulo N) to one location beyond the accepted output. Instead of updating
after every grant, the outer pointer updates only if an input accepts the grant. iSLIP
significantly reduces pointer synchronization and accordingly increases throughput with a
lower average packet delay. The algorithm does, however, suffer from degraded
performance in the presence of non-uniform and burst traffic flows, lack of inherent QoS
support, and limited scalability with respect to high port densities. Despite its
weaknesses, iSLIP’s low implementation complexity promotes its extensive deployment
side by side with various crosspoint switches.
Multiterabit Networks

6. Conclusion

Multiterabit packet switched networks will require high


performance scheduling algorithms and architectures. With port
densities and data rates growing at an unprecedented rate, future
prioritized scheduling schemes will be necessary to pragmatically scale
toward multiterabit capacities. Advanced scheduling schemes exploit
concurrency and distributed computation to offer a faster, more efficient
decision process. Further, support of strict QoS requirements for the
diverse traffic loads characterizing emerging multimedia Internet traffic
will increase. Continuous improvements in VLSI and optical
technologies will stimulate innovative solutions to the intricate packet-
scheduling task.
Multiterabit Networks

7. References

*DWDM tutorial at IEC www.iec.org/tutorials/dwdm


*IEEE Computer www.computer.org
*Stanford University www.ee.stanford.edu/
*Key Network Technologies www.3com.com/technology/

You might also like