Vlsi Final Notes Unit4

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

UNIT 4

VLSI Design (ECE 401 E)


DELAY MODELS
Delay modeling of ICs can be done in several ways, resulting in trade-offs between computation times and
analysis accuracy. Consider the following timing analysis methods:

Analog Simulation uses a simulator such as SPICE. These simulators require models of the I-V
characteristics of the devices. The analysis is done using numerical integration methods. Worse case
delay analysis requires finding worse case input patterns. These methods are very time-consuming, and
thus timing analysis using these methods is all but impractical, except for special problems.
Timing Simulation uses models of circuit and interconnect delays. As an example, consider the Crystal
timing simulator. In this case simple models of gates and interconnections are used, and thus timing
analysis can be done efficiently on large circuits. However, these methods pay a price in terms of
accuracy.

In static logic circuits, delay is measured between the 50% point of the waveforms, as shown in Figure 110.
This choice assumes that the switching points of all gates are approximately at V DD/2. The time from the
10% point of a waveform to the 90% point is called the rise time t r. The time from the 90% to the 10% point
is referred to as the fall time tf.

Figure 110. Definition of Rise Time and Fall Time.

1. Gate Delay Models


The simplest gate delay model sets a fixed propagation time, or gate delay TD, from a gate input to a gate
output. This model is useful particularly with combinatorial optimization algorithms such as technology
mapping, and in simple logic simulators that are used early in the design process. However, this model is
very inaccurate as it neglects many important influences on gate delay such as:

Output Lading: As capacitive loads caused by the input capacitance of the fanout gates increase, gate
delay increases.
Wire Loading: As capacitive loads caused by the wiring network driven by the gate increase, gate delay
increases.

Input Waveform Shape: Slow input signals (signals with large rise times) cause long gate delays.

A more detailed model of gate delay is the on-resistance model shown in Figure 111. This model consists of
the following components: the input gate loading or gate capacitance C g, the output gate loading capacitance
Cd, the intrinsic gate delay TD, and the gate output on-resistance Ron. The function of a gate can be described
as follows. Suppose the input patterns of the gate change, causing a change in the output of the signal.
Suppose the latest change occurs at some time t. Then the gate output begins changing at time t + T D.
Suppose the gate is driving a load capacitance C L, then the rise time constant of the output waveform is =
Ron (Cd + CL). Given some assumptions about the shape of the waveforms (exponential or ramp), these
figures can then be used to compute gate delays.
Although inaccurate (the accuracy of this model is around 20%), the on-resistance model can describe many
properties of physical designs. For example, the effects of increased capacitive loading are clear. Also, as
shown in following sections, this model is suitable for efficient delay modeling of lossy interconnections.

Figure 111. (a) Schematic for a simple CMOS inverter driving a distributed RC wiring load
and a single gate load. (b) On-resistance buffer and interconnection delay model,
the distributed RC wiring is modeled with lumped components.

2. Models for Interconnect Delay


Figure 112 shows a three-dimensional view of a wire and an electrical model including several parasitic
effects . Interconnect delay is a result of these parasitic effects:

Interconnect Capacitance Cw, which arises from three components: the area component, also referred
to as parallel plate capacitance; the fringing field component that accounts for the finite dimensions of
wires; and the wire-to-wire capacitance between neighboring wires. In modern IC technologies the latter

two are becoming comparable to parallel plate capacitance. Given a wire of width w, length l, dielectric
thickness tox, a dielectric constant ox , and the fringing field factor, the wire capacitance is

C w = ox wl f r
t ox

Interconnect Resistance Rw, which arises since conductors in modern ICs are not ideal. Given the
resistivity of the wiring material , and the thickness of the wire h, the wire resistance is
l
Rw =
wh

Other effects such as interconnect inductance Lw and shunt conductance Gw . These effects have been
negligible up to date in IC technologies; however, they may become more important with decreasing
dimensions.

Figure 112. (a) Three-dimensional view of an IC wiring structure. The dimensions of the wire are its
length l, width w, thickness h, and its distance from a ground plane tox.
(b) Electrical model of the parasitic effects of the wire.
This section considers various interconnect models that handle parasitic capacitance and resistance. The
most accurate model of interconnect at high speed is the transmission line model, where wires have
capacitance, resistance, and inductance per unit length. These models are difficult to use and are
unnecessarily complicated for present-day IC technologies. Transmission line analysis is only called for
when the time of flight is longer than the signal rise times. This effect only occurs in the fastest of modern
IC technologies such as gallium arsenide ICs. For this reason, this section focuses on two practical
interconnect models, the distributed RC network model and the lumped RC network models.

3. Delay in RC Trees

The general time-domain solution of any RC network is a well-known problem in numerical analysis. An
RC network (as shown in Figure 113) produces a linear system of differential equations that can be solved
for an exact response. In general, given a linear network with n storage elements the response of the Laplace
transform (or frequency domain) of the network, H(s) = L(h(t)) can be expressed in the form
H ( s )=

H ( s )=

1+a1 s ++ am s

s ( 1+ b1 s++bn sn )
k1
k
n
+ + i + +
s p1
s pi
s pn

Which has the time-domain solution


p 1t

pi t

h ( t ) =k 1 e + +k i e ++ k n e

pn t

where pi and ki are the poles and residues, respectively, of the frequency domain expression. Unfortunately,
the problem of computing the exact poles and residues from the frequency domain solution is a
computationally expensive problem. Furthermore, it has been observed in practice that the response of
practical RC networks is dominated by a few poles and residues.

Figure 113. Example of RC Tree for a CMOS VLSI Circuit with a fanout of 2.

TIMING DRIVEN PLACEMENT


Timing-driven placement incorporates timing objective functions into the placement problem. Depending on
how important the timing is, designers can first try to fit the timing requirements and then minimize the area,
or vice versa. Nets that must satisfy timing requirements are called critical nets. The criticality of a net is
dictated by input and output timing requirements. In timing-driven placement, one tries to make critical nets
timing-efficient and other nets length- and area-efficient. Timing can be represented by clock period T,

T =S +d max +T 0
where S, dmax, and T0 represent clock skew, maximum delay, and the adjustment factor, respectively.
Placement is a very important step in designing timing-efficient circuits. If a placement is bad in terms of
timing, even a very powerful router cannot help. There are basically three approaches to the timing-driven
placement problem: the zero-slack algorithm, the weight-based placement, and the linear programming
approach.

1. Zero Slack Algorithm


Consider a circuit without any loops, represented by a directed acyclic graph (DAG). If a circuit has loops, it
is transformed into one without loops by removing a small number of edges. These edges will later be placed
back in the circuit and the timing is verified. Inputs to the circuit (i.e., vertices with in-degree zero) are
called primary inputs (PIs) and outputs of the circuit (i.e., vertices with out-degree zero) are called primary
outputs (POs). Assume the signal-arriving time at each PI and the time a signal is required at a PO are
known. Gate delays are also given. By tracing a signal path starting from the PIs, the arrival time t A of the
signal at each pin (i.e., at inputs and the output of each gate) can be obtained. Here, it is assumed that each
gate has single output; however, the fanout of a gate could be more than one. This is equivalent to saying
that all outputs of a gate carry the same signal. This model can be extended to include one in which each
gate has k distinct output signals. This is accomplished by duplicating the gate k times. The input to each of
the duplicates is the same as the input to the original gate. However, each duplicate will have only one of the
outputs of the original gate.
Similarly, by starting from the POs and tracing back the path of each signal, the required time t R at each pin
is obtained. Define

slack = tR tA.

..(1)

Note that the arrival time at the output of a gate is known only when the arrival time at every input of that
gate has been computed. Thus, a topological sorting is performed on the gates (vertices) to find the arrival
times. (A topological sort is the process of visiting the gates one by one. A gate can be visited if all its inputs
are PIs or when all the gates providing signal to it have already been visited.)
The relationship between the arrival time of an output pin,
cells,

tA ,

tA ,

t A , .. , t A

t Aj , and that of the output pins of the fanin

is expressed in Equation 1. Terms

dn

dn

, ..,

dn

are net

delays, and D is the delay of the cell containing pin j . The relationship is also shown in Figure 114.
t Aj =max mi=1 ( t Aj + d n ) + D

..(2)

Deciding the required times is done in a similar manner, except the signal direction (i.e., the edges of the
DAG) is reversed when topological sorting is applied. The relationship between the required time of the
output pin,

t Rj , and that of the related output pins of the fanout cells,

t 1R ,

t 2R ,

t 3R , .. , t mR

is

shown in Figure 115 and expressed in Equation 2. The terms D 1, D2, D3, . . ., Dm are cell delays, and dn is the
delay of the net that connects the cell to its fanout cells.

t Rj=minmi=1 ( t RjDi ) d n

..(3)

The zero-slack algorithm associates a maximum allowable delay to each net in an iterative manner. In the
first iteration, the delay of each net is assumed to be zero. The purpose of the zero-slack algorithm is to
distribute slacks to each net. It transforms timing constraints to net length constraints. Consequently, primary
inputs, primary outputs, and the outputs of each cell will have zero slacks.

Figure 114. Computation of arrival time at an output pin.

Figure 115. Computation of required time at an output pin.

An example is demonstrated in Figures 116 122. A triplet t A/slack/tR is shown at each (input and output)
pin. Figure 116 shows the input circuit with the delay of each net initially set to zero. The arrival times of
primary inputs and the required times of primary outputs are given. Slacks are calculated according to
Equations 1 3. The minimum positive slack is four and the corresponding path, then one is arbitrarily
selected. In Figure 117, the four units of slack have been distributed along the path segment of Figure 116. It
indicates that each of the segments on the path can have length proportional to the assigned length (in this
case, 1) in the final layout. A unit-delay model is assumed; that is, the propagation delay along a wire of
length l is O(l). After slack distribution, slacks on the path segment become zero. Then the new slacks are
calculated. The new path segment is primary input I2 and the net connected to it. Figures 118 122
demonstrate subsequent steps. Figure 122 shows the final result, where the output pin slack of each gate is
zero. The just described algorithm is summarized as follows:
procedure zero-slack
begin-1
repeat until no positive slack;
compute all slacks;
find the minimum positive slack;
find a path with all the slacks equal to the minimum slack;
distribute the slacks along the path segment;
end-1.

Figure 116. The zero slack algorithm example, step 1.

Figure 117. The zero slack algorithm example, step 2.

Figure 118. The zero slack algorithm example, step 3.

Figure 119. The zero slack algorithm example, step 4.

Figure 120. The zero slack algorithm example, step 5.

Figure 121. The zero slack algorithm example, step 6.

Figure 122. The zero slack algorithm example, step 7.

2. Weight Based Placement


Most of the existing placement algorithms are designed to minimize certain cost functions. Therefore, if
weights derived from timing considerations are incorporated into the cost function, a timing-driven
placement algorithm is obtained. The approach for determining weights will be described. Motivated by that,
a timing-driven placement technique can be summarized as follows.
Step 1. Find a layout using a placement algorithm that incorporates weights.
Step 2. Perform a timing analysis.
Step 3. Assign weights to critical paths or nets.
Step 4. Go to step 1, using the new weights.
The approach is widely used in hierarchical placement algorithms. The reason is that partitioning is a central
step there and it is very easy to consider weights in the partitioning step. A disadvantage of the technique is
that it is hard to predict the result of changing weights as new critical paths or nets may be created in
subsequent steps. Moreover, the iteration behavior becomes uncontrollable, especially in aggressive designs,
where most paths are critical.
One approach to deciding appropriate net weights is to use a measure of node criticality (where only the
capacitive delay of the nets is considered) or measures of edge criticality (where the RC delay of the nets is
considered) as weights . Criticality was used as a guide to select cells from a cell library. (In a cell library,
there are cells with different driving power and area that implement the same function. This criticality can be
used as weighting function.

3. Linear Programming Approach


A linear programming (LP) based performance driven placement algorithm is proposed. It is a hierarchical
technique in which linear programming is an important step (at each level of hierarchy an LP is solved). The
basic idea is to express spatial and timing constraints using a set of linear inequalities and to try to minimize
a cost function that combines timing and area. Also, timing constraint information is considered by path
delays instead of net delays. This expands the solution space being searched.
It is assumed that each cell has only one output pin, the delay of a net is proportional to the length of its
bounding box (it treats wires as capacitors), and there is a single output resistance for all cells .
However, after linear programming is applied to all cells, the cells may overlap each other or cluster in some
places. Linear programming can be performed in a hierarchical manner to minimize the cost function.
Levels of hierarchy are defined by recursive partition until each cell is in a distinct slot. Each partition will
divide the target region into two and move suitable numbers of cells, which are in a crowded region, to
another region. The likelihood of this move depends on the distance between the two regions. At the next
level, linear programming is applied to the cells that are located in the half-regions assigned at the previous
level of hierarchy. The procedure is summarized formally as follows.
procedure performance-driven placement (circuit specification)
begin-1
apply linear programming to place the cells;
(* the placement may have overlaps *);
partition the current region into two subregions:
left and right, or bottom and top, alternatively;
evenly redistribute the cells;

call performance-driven placement (left or top);


call performance-driven placement (right or bottom);
end-1.
An advantage of this technique is that almost any constraint can be incorporated into the formulation. The
main disadvantage of the approach is that it is time-consuming.

TIMING DRIVEN ROUTING


Timing-driven placement has been given much attention because it is of fundamental importance. That is, if
attention is not given to timing issues at the placement level, not much can be done in the routing phase.
However, timing-driven routing can also improve circuit performance.
Linear delay models assume that delay along a wire of length. While less accurate than the distributed RC
tree delay model, the linear delay model has been effectively used in clock tree synthesis. A more accurate
delay model is the Elmore delay model, in which the delay calculation is recursive and somewhat involved.
An instance of timing-driven routing consists of a set of points in a two dimensional plane. One of the points
is the source and the rest are the sinks. The distances are measured in metric or rectilinear metric, the metric
commonly used in VLSI environments. Most algorithms can be adapted in other geometries, such as 45
geometry (allowing horizontal, vertical, and plus and minus 45 lines) and Euclidean geometry (allowing
arbitrary angles).

1. Delay Minimization
Delay minimization can be used to obtain an effective timing-driven global router. The procedure can also
be used as a detailed router.
Let the radius R of a signal net be the length of a shortest path from the source s to the furthest sink, that is,
R = maxxN dist(s, x). The radius of a routing tree T, denoted by r(T), is the length of a shortest path in T
from the source s to the furthest sink. Clearly, r(T) R, for any routing tree T. According to the linear RC
delay model, the interconnection delay of a net is minimized by minimizing the radius of the routing tree,
that is, the maximum interconnection delay between the source and the sinks. Furthermore, the goal is also a
routing tree with small total cost. Here, the total length of the tree is used as the cost. Other cost measures
might be a measure of the crowdedness of the channels or the number of bends, proportional to the number
of vias, in the net.
In order to consider both the radius and the total length in the routing tree construction, bounded radius
minimum routing trees (BRMRT) are used. The reason for minimizing the length is that it has been observed
that global routers aiming to minimize wire length produce small chips. Given a parameter 0 and a signal
net with radius R, a BRMRT is a minimum-length routing tree T with radius r(T) (1 + ) x R.
The parameter controls the trade-off between the radius and the length of the tree. When = 0, the radius
of the routing tree is minimized and thus an SPT (shortest path tree) for the signal net is obtained. On the
other hand, when = , the total length of the tree is minimized and an MST (minimum spanning tree) is
obtained. In general, as grows, there is less restriction on the radius, allowing further reduction in tree cost.
Figure 123 shows an example where three distinct spanning trees are obtained using different values of .

Figure 123. Three different spanning tree with distinct values.

2. Clock Skew Problem


In present and future circuit design, speed is of fundamental importance. Delay on the paths is a major factor
in determining the speed. Another important factor is the clock skew that is caused by asymmetric clock
distribution. Clock skew increases the total clock cycle time. Consider a clock tree consisting of a source
and a set of sinks. The delay of the path with longest delay between the source and a sink is denoted by d max.
Similarly, the delay of the path with smallest delay between the source and a sink is denoted by d min. The
value dmax dmin is the clock skew. In most cases, clock trees with skew equal to zero (and small total length)
are most desirable. See Figure 124, where an example of a tree with a large skew (left) and one with a small
skew (right) are shown. The operation shown in the figure is called H-flipping.

Figure 124. Transforming a clock tree with large skew to one with small skew.
The minimization of clock skew has been studied in the past few years. First, researchers proposed to use the
H-tree construction in systolic arrays. The H-tree construction can be used when all components have equal
sizes and are regularly placed. When all blocks are organized hierarchically, a clock distribution technique
was proposed in. It was assumed that the number of blocks at each level of hierarchy was small since an
exhaustive search algorithm was employed to enumerate all the possible routes.
More recently, two recursive algorithms were proposed. They are both based on the linear delay model;
recall that the delay along a path of length l in a given routing tree is O(l). First, Jackson, Srinivasan, and
Kuh proposed a top-down approach to the problem. Their algorithm (the JSK-algorithm) recursively
partitions the problem into two equal-sized (i.e., with equal number of points) subproblems. At each step, the
center of mass of the whole problem is directly connected to the center of mass of the two subproblems, as
shown in Figure 125. The center of mass of a point set {P 1,.....,Pn}, where Pi has coordinates xi and yi is

defined as a point with x-coordinate x = (x1 + . . . +xn)/n and y-coordinate y = (y1 + . . . + yn)/n. It is shown
that clock skew (dmax dmin) is bounded by O(1/

n ) for a point set uniformly distributed in the unit

square, where n is the total number of points.

Figure 125. Example of the top-down JSK-algorithm.


The second recursive algorithm was proposed by Cong, Kahng, and Robins. Their algorithm is a bottom-up
technique based on recursive matching. Let {P1,.....,Pn} be the set of sinks, each being a point in a twodimensional Euclidean plane. The set of points are matched-a minimal (length) matching has been employed
as shown in Figure 126(a). Then the centers of the just matched points, denoted by

{ P11 , ., P1n /2 }

, are

found (Figure 126(b)). Here, the center is defined as the point that minimizes the length from it to the
furthest node minus the length from it to the closest point. That is, let c be the selected center, l max the length
of the longest path from c to a sink, and l min the length of the shortest path from c to a sink. The center c is
selected to minimize lmax - lmin. At this stage the generated center points are also matched as shown in Figure
126(b). This procedure is repeated until all the points are matched. The final tree is obtained as shown in
Figure 126(c). It is shown that the total length of the final tree for a set uniformly distributed in the unit
square is O(

n ).

Figure 126. Recursive steps in the bottom-up matching algorithm.

3. Buffered Clock Trees


Modern high-speed digital systems are designed with a target clock period (or clock frequency), which
determines the rate of data processing. A clock network distributes the clock signal from the clock generator,
or source, to the clock inputs of the synchronizing components, or sinks. This must be done while
maintaining the integrity of the signal, and minimizing (or at least upper bounding) the following clock
parameters:

the clock skew, defined as the maximum difference of the delays from the clock source to the clock pins;
the clock phase delay (or latency), defined as the maximum delay from the clock source to any clock
pin;
the clock rise time (or skew rate) of the signals at the clock pins, defined as the time it takes the
waveform to change from a VLO to a VHI value;
the sensitivity to the parametric variations of the clock skew, clock phase delay, and rise time. The most
important sensitivity is to the clock skew.

Additionally, since in modern microprocessors the clock networks can consume a large portion of the total
microprocessor's power (up to 50%), the clock design objectives must be attained while minimizing the use
of system resources such as power and area. In high performance systems, these constraints can be quite
severe. Consequently, the layout of a good clock distribution network is difficult and time-consuming.
A very basic concern in the construction of clock trees is the integrity of the clock signal. Since the chip
interconnect is lossy, the clock signal degrades as it is transmitted through the network. The longer the path
from the source to the sink, the longer the rise time of the clock signal. If the rise time of the clock signal
exceeds the clock period, then the clock signal is unable to reach suitable logic high and low values. This
problem can be remedied by several different clock network design strategies.
Other work on buffered clock trees has focused on minimizing the phase delay of the clock tree on the
assumption that reducing clock tree delays also reduces skew and skew sensitivity to process variation. In
these works, the delay model usually consists of a fixed on-resistance and delay for each buffer. More
recently, work in considers the minimization of skew and delay in the presence of process variations in
buffered clock trees.
Various wiring-oriented approaches exist to resolve these problems. First, it is possible to taper the wire
widths of the clock tree such that the rise times at the terminals are decreased and perhaps satisfied (Figure
127(a)). Another alternative is to increase the capacitance and decrease the resistance on the source-to-sink
paths by using wide bus techniques (Figure 127(b)). This approach can be taken to extremes by constructing
a fat clock bus from which other clock pin connections are obtained (as shown in Figure 127(c)). This last
type of a clock tree has to be driven by a very large buffer, as in Figure 128(a); thus large amounts of power
are consumed by the clock driver and the clock interconnect. However, since the delays of the clock tree are
dominated by the capacitance of the wide bus, the bus-to-clock pin interconnection can be done simply and
efficiently.

Figure 127. (a) Clock tree with tapered wire widths. (b) Clock tree using a wide bus approach.
(c) Fat bus clock tree construction
There are also buffer-oriented approaches to clock network constructions. Buffer-oriented approaches
partition the clock tree into sections using buffers. Consider a buffered clock tree or clock power-up tree,
which contains buffers in its source-to-sink paths. Several examples of buffered clock trees are shown in
Figure 128. In this case, to maintain the integrity of the clock signal, it is sufficient to restrict all the rise
times in the network to values smaller than the clock period. The addition of buffers in source-to-sink paths
reduces the resistance in these paths, and therefore reduces the signals' rise times. As a result, the number of
buffers needed by the clock tree is bounded below by the clock's target clock period. Algorithms are
proposed to compute the minimum number of buffers required to satisfy a given clock period. Bufferoriented and wiring-oriented clock designs exhibit several trade-offs.

Wiring-oriented solutions are not very sensitive to fabrication variations, while buffer-oriented solutions
may be more so.
The construction of wiring-oriented networks requires careful analysis and modeling of the interconnect
networks, while buffer-oriented trees must also take into account detailed buffer delay modeling.
Buffer-oriented solutions potentially consume less power than wiring oriented solutions.
Embedding the buffers into the placement may not be as easy as constructing a wiring-oriented solution.

Consider buffered clock tree constructions along with process variations and realistic buffer delay models.
To obtain zero skews while reducing sensitivity to process variations, buffered clock trees usually have
equal numbers of buffers in all source-to-sink paths. Furthermore, at each buffered level the clock tree
buffers are required to have equal dimensions (Figure 128(c)) to avoid buffer parameter variation and buffer
delay model accuracy problems. Using the buffer delay model described earlier, two additional steps are
required to obtain zero-skew clock trees: the capacitive loads of the buffers in one level must be the same
(this is called load matching), and the rise times at the inputs of the buffers at one level must also match.
Note that wiring-oriented techniques may be applied at each buffer stage to produce load and rise time
matching.

Figure 128. (a) A buffer chain with tapered sizes driving a clock tree. (b) Clock power-up tree,
buffers of a single size. (c) Clock power-up tree, with buffers of tapered sizes.
that all buffers on the same level have the same size.)

with
(Note

The clock network construction problem presents trade-offs between wire length and skew, and between
variation of parameters under manufacturing conditions (effectively affecting chip yield) and power
consumption. These trade-offs present a challenge to the designer and to those seeking to automate the clock
design process.

VIA MINIMIZATION
Vias (or contact holes) between different layers of interconnection on dense integrated circuits can reduce
production yield, degrade circuit performance (by increasing propagation delay), and occupy a large amount
of chip area (as the physical size of a via is larger than the wire's width). Also, vias on multilayer printed
circuit boards raise the manufacturing cost and reduce product reliability. Thus, much research effort has
been focused on minimizing the number of vias.
There are basically two categories of via minimization problems, constrained via minimization (CVM), and
unconstrained via minimization (UVM). In constrained via minimization, given a routing region, a set of
modules and terminals, and the layout of the nets (the detailed routing), the question is how to assign the
wire segments to the layers such that no two wires of different nets intersect in the same layer and the
number of vias is minimum. In an unconstrained via minimization problem, the layout of the nets is not
given, and the question is how to find a topological (or global) routing such that the minimum number of
vias is needed.

1. Constrained Via Minimization


This section focuses on constrained via minimization in two-layer environments on circuits involving twoterminal nets. A circuit involving multiterminal nets can be partitioned into one with two-terminal nets.
Figure 129(a) shows a layout before via minimization. The objective is to find the position for vias such that
no two wires of different nets intersect in the same layer and the number of vias is minimum. Figure 129(b)
shows an optimal solution. An algorithm is presented below for finding a minimum via solution, along with
several examples.

Figure 129. CVM. (a) Before via minimization. (b) After via minimization.
Consider an instance of CVM shown in Figure 130(a). A crossing graph is one where the vertices represent
the intersection of nets and an edge is present between two adjacent intersections. The crossing graph of the
layout shown in Figure 130(a) is shown in Figure 130(b). The faces of the graph are labeled as a, b,, g,
where face g is the infinite (or boundary) face. From the result of CVM, represented by a plane graph G, the
dual graph G' is obtained. A face with an odd number of edges on its boundary in G is an odd-cycle face. An
even-cycle face is similarly defined.

Figure 130. An example of CVM and its crossing graph.

2. Unconstrained Via Minimization


In contrast to the constrained via minimization, an unconstrained, or topological, via minimization (UVM)
considers topological (or relative) routing of the wires and layer assignment simultaneously. Techniques for
solving UVM, first decide the topological information among the signal nets such that the number of
required vias is minimum. Then a geometrical mapping of the topology to the layers is performed. An
example is shown in Figure 131. The topology for the nets interconnecting the given terminals is shown in
Figure 131(a), and the layout after geometrical mapping is shown in Figure 131(b).

Figure 131. UVM. (a) Topology of the net with one via; (b) geometrical mapping.

Figure 132. Topological routing graphs. (a) A circle graph; (b) net intersection graph; (c) a max-cut.
Consider a routing region and a set of terminals. The routing region is characterized as a circle and the nets
are drawn as chords. The intersection graph of the chords is called a circle graph, as shown in Figure 132(a).
The vertices of the net intersection graph (see Figure 132(b)) represent the nets, and the edges represent
intersections between the corresponding nets. That is, an edge (i, j) exists if nets i and j intersect in the circle
graph. The goal is to delete a minimum number of vertices from the intersection graph, such that the
resulting graph is bipartite; see Figure 132(c). This creates two sets of vertices that can be placed into two
layers without any vias. In the figure, L 1 represents a set of nets assigned to layer 1, and L 2 represents a set
of nets assigned to layer 2. The sets L 1 and L2 define the maximum sets of wires that can be routed on two

layers without vias. The deleted vertices are referred to as the residual nets. The problem now reduces to one
of routing the residual wires.

3. Other Issues in Via Minimization


As VLSI technology advances, particularly as more than two layers are being used in industry, via
minimization will become increasingly important. Researchers provided a more general formulation of
UVM, and considered variations that arise due to differences in multilayer technologies such as VLSI and
high performance packaging. Their formulation allows pins to be assigned to a layer, more than two pins per
net, and regions of unordered pins that occur in channel routing problems. Various restricted classes are
discussed; problems in most classes are NP-hard. Another class of via minimization problem is allowing
modification of the given layout. Two procedures are used to modify the layout. The first one is to search for
an alternate route to eliminate a via. The second one is to improve the result by changing the layer of wire
segments by shifting vias along wire segments.

POWER MINIMIZATION
Power consumption in CMOS circuits is due to the dynamic power consumption of charging and
discharging capacitive loads during output transitions at gates, the short circuit current that flows during
output transitions at gates, and the leakage current. The last two factors can be made sufficiently small with
proper device and circuit design techniques; thus, research in design automation for low power use has
focused on the minimization of the first factor, the dynamic power consumption.
The average dynamic power consumed by a CMOS gate is given below, where, C l is the load capacity at the
output of the node, Vdd is the supply voltage, Tcycle is the global clock period, N is the number of transitions
of the gate output per clock cycle, Cg is the load capacity due to input capacitance of fanout gates, and C w is
the load capacity due to the interconnection tree formed between the driver and its fanout gates.
V 2dd
V 2dd
Pav =0.5
C N =0.5
( C +C ) N
T cycle l
T cycle g w
To minimize the power following techniques are used:
1. Reducing chip and package capacitance: this can be achieved through process development with
partially and fully depleted wells, CMOS scaling to submicron device sizes, and advanced interconnect
substrates such as Multi-Chip Modules (MCM). This approach can be very effective but is very
expensive.
2. Scaling the supply voltage: This approach can be very effective in reducing the power dissipation, but
requires new IC fabrication processing. Supply voltage scaling also requires support circuitry for low
voltage operation including level converters and AC/DC converters as well as detailed consideration of
issues such as signal to noise margins.
3. Employing better design techniques: In this approach the investment to reduce power by design is
relatively small in comparison to other techniques.
4. Using power management strategies: The power saving that can be achieved by various static and
dynamic power management techniques are application dependent but can be significant.

OTHER PERFORMANCE ISSUES

Timing-driven placement, timing-driven routing, the via minimization problems, and the power
minimization problem are the problems given the most attention in the literature. Other problems have been
briefly studied from a timing point of view. A summary of these results follows.
Timing-driven partitioning consider both timing and capacity constraints in the partitioning phase. The main
application is to assign functional blocks into slots on multichip modules during high level design. It
provides fast feedback on impact of high level design decisions. In the first phase, clustering is done to
ensure the timing constraints are satisfied. Then a packing, followed by a traditional partitioning (e.g., a
Kernighan-Lin technique), is done to satisfy capacity constraints. Compared to a classical partitioning
scheme, experiments show that the new technique has no timing violation if the capacity is increased by a
small percentage (about 5 to 6% on the average).
Some factors that determine the suitability of a decomposition are the geometry of the modules, the
connectivity among modules, and timing constraints. A hierarchical-clustering-based algorithm that leads to
a small number of superior candidate hierarchies. An approach to perform simultaneous placement and
routing of hierarchical IC layouts is the method based on the concept of slicing, which introduces a specific
regularity to the problem with certain constraints on the chip floorplan. The placement part consists of a
number of interrelated linear ordering problems, corresponding to the ordering of slices. The routing part
outlines a hierarchical pattern router, applicable to slicing structures. The IC layout construction progresses
top down in small design increments, corresponding to the processing of the individual slices. The
placement and routing data are acquired at intervening time slots and influence each other at each level of
hierarchy.
Timing-driven floorplanning is a mathematical programming (i.e., a constrained nonlinear programming)
approach that incorporates both timing and geometric information in the formulation. The three steps
employed are timing minimization with module overlap, module separation, and timing minimization
without module overlap. To obtain a fast algorithm, techniques for removing redundant paths are used; thus,
the number of constraints is reduced. The information obtained on the initial placement is fed into a
floorplan package to find the final shapes, sizes, and pin locations.
A new three-layer over-the-cell channel routing algorithm in which router minimizes the channel height by
using over the cell areas and achieves the net's timing requirements; each net has a maximum allowable
upper bound on its length. In this work, 45 wire segments are used to route the nets over the cell to further
reduce the net length. Experimental results show that the same area as used by previous over the cell routers
can be obtained while satisfying length constraints.
Other performance issues in physical design include regularity, layout synthesis systems, and automated
generation . Regularity is an important issue inVLSI design. Ishikawa and Yoshimura presented a new
module generator for generating functional modules with structural regularity. Unlike most module
generators that are only able to place cells, the proposed generator can generate functional modules with
structural routers for regular structure layouts such as multipliers and RAMs. A synthesis system for the
automatic layout of NMOS gate cells is described. The cells are based on multigrid cell models and are
intended for use as part of a chip synthesis system. An outline of the basic concepts of a CAD procedure for
the layout synthesis of such cells is given. The main objective is to generate correct and compact cells with a
controlled growth in area when they are subjected to modified speed requirements. The key is to determine
the most profitable level of abstraction for the designer, which is accomplished by the introduction of macro
abstraction, interface inheritance, delay binding, and the complete decoupling of procedural and graphical
design information. These abstraction mechanisms are implemented in the regular structure generator, an
operational layout generator with significant advantages over first-generation layout tools.

Recently, a new class of placement strategies, incorporating the knowledge of underlying circuit structure
targeted for high-performance circuits. The net regularity is measured using the standard deviation of the
locations of its terminals. Experimental results showed that exploiting circuit structural information led to
substantially more regular nets, compared to placement without structural information. Benchmark circuits
showed 1.9 to 26 times the number of nets with zero standard deviation. Furthermore, the length of the
longest net in a chip was reduced from 5 to 22%. The placement strategy is crucial for high-performance
circuits since regular nets have relatively fewer vias and reducing the longest net length improves signal
delay.
Some other performance issues are minimize the number of power pads, electrical performance, critical area
and bend minimization.

You might also like