ORCA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ORCA A Sea-of-gates Place and IRoute System

Mitsuru Igusa, Mark Bearcdslee and Albert0 San.giovanni--VincentelK

Department of Electrical Engineering and Computer Sciences


University of California, Berkeley, California 94’7’20

Abstract the passage of more global inter-cell connections. For


that reason, and to simplify the placement and rout-
Sea-of-gates is becoming an important design style for ing problems associated with these arrays, the inter-
Application Specific Integrated Circuits (ASECs). The cell connections are implemented on a rectilinear grid
sea-of-gates technology offers more flexible placement in the “channels” between the cells. Over-the-cell inter-
and routing options not available in gate arrays. Very connections are typically characterized by straight 2nd
few systems are available today that can automatically layer metal “feedthroughs”.
layout sea-of-gates and none of these systems effectively This recent variation of the gate-array structure is
use the features available in sea-of-gates architecture. often referred to as a %ea--of-gates” array [l] and, as
ORCA is a place and route system for se&of-gates, well as providing new challenges for t:he CAD commu-
whose objective is to produce the highest density lay- nity, promises to replace many of the designs previously
out by fully exploiting the inherent features of this new undertaken in the conventional gate-*array style. The
design style. The ORCA system starts with a mod- routing area is not organized a priori into channels as
ule generator which preprocesses memory arrays and in standard gate-arrays. The basic building blocks in
other logic with a regular structure to form hig,h den- this design style are similar to the ones used in stan-
sity macros. The remaining logic is clustered together dard gate-arrays. In the sea-of-transistor architecture,
to form flexible macros, which we call porous., The using CMOS processing technology, transistors are in a
porous macro-cells allow global routing to pass through compact regular arrangement in rows (columns) of com-
the macro instead of detouring around its perimeter. plementary pairs.
The porous macros are dynamically shaped and resized Given the tight arrangement of the transistors, it is
by interaction with global wiring analysis. Finally, a possible to design large macros effectively with little
general channelless area router has been developed to area penalty. For example, it is possible to design RAM
address the multiple layers of interconnect and routing blocks using the transistor array. This capability makes
areas which will be dominantly over-the-cell. Due to the sea-of-gates approach more competitive than gate-
the large size of the problem (e.g. 100,000 gates), the array and possibly an alternative to some low volume
placement and routing algorithms are hierarchical. standard-cells and macro-cells. However, the automatic
generation of chips which fully utilize t,hese inherent c&
pabilities of sea-of-gates is more difficult than with the
1 Introduction standard gate-array design style.
Three iayers of interconnects appear more often in the
The “gate-array” (also referred to as master-slice, or larger VLSI designs in this design sty.le due to the po-
uncommitted logic array) is by far the most cctmmon tential increase in chip utilization and performance (i.e.
design style for ASICs. The computer aids for gate- shorter wire length and decreased capacitance). The ac-
array design are also the most advanced and the most tive area utilization could be dramatically increased, but
mature. In this approach, a two-dimensional array of the wiring problem becomes more complex and must ef-
replicated transistors is fabricated to a point just prior ficiently deal with over-the-cell routing.
to the interconnection levels. Generally a two-level in- In this paper, we present the ORCA system, which
terconnection scheme is used for signals and, in some consists of module generators, placement, and routing
approaches, a third more coarsely defined layer of inter- tools. The system is designed to support a variety of
connections is provided for power and ground connec- layout styles:
tions. Because one or more interconnection layers are
used within a group of transistors to define the function 1. standard-cell like, where the basic cells are ar-
of a cell, these intra-cell interconnections often block ranged in rows of equal height and the routing area

26th ACM/IEE!E Design Automation Conference@


Paper 8.4
122 0 1989 ACM O-89791 -31 O-8/89/0006/01 22 $1.50
is arranged into channels with flexible dimensions; the layout can be routed. Finally, the area router does
the detailed routing.
2. macro-cell like, where macros can be built using
basic cells belonging to different rows of active de-
vices, and macros as well as celis can occupy any 3 Floorplanning and Placement
position of the chip within the constraints of the
architecture; The size of a sea-of-gates layout can be as large as
100,000 gates, and the largest available sea-of-gates ar-
3. porous macroxell, where the basic cells are clus- rays will undoubtedly become even bigger in the future.
tered together to form macros which are flexible While other steps of the sea-of-gates physical design
in shape and allow routing to pass through the system may subdivide their problem into a smaller more
macro. The key feature of this method is that it managable subtasks, the floorplanning/placement step
allows routing to go straight through macro blocks must, at some point, deal with the entire design all at
instead of detouring around, thus reducing wiring the same time. To be able to handle such large circuits,
length and consequently the chip area (Figure 1). fast and memory efficient algorithms are used.
One algorithm that is effective and fast is the min-
cut algorithm [2]. A variation of this algorithm [3] has
after placement after respacing been shown to complete in time linearly proportional to
1. Macro cell like the size of the network. The min-cut algorithm takes a
t general network and divides it into two groups so that
I I I I
the number of nets that connect cells in different groups
is minimized and the two groups have approximately the

all A

2. Porous macro cell


I I I
:
same size. This has the effect of grouping cells that are
highly connected so that they are close together, thereby
minimizing the amount of wiring in the final layout.
The floorplanning step performs assignment of all the
cells to restricted areas of the layout space by using
the min-cut algorithm to hierarchically partition the
circuit. The process begins by first making a cut in
the network. Then two groups of cells resulting from

ui
A
A ,I_ 1 the cut are constrained to lie inside respective halves of
the layout space. The two resulting pieces will then be
partitioned with a cut line perpendicular
The floorplanning
to the first.
hierarchy can now be represented as
a tree with the leaf nodes being groups of cells and the
internal nodes representing network cuts. The cuts are
Figure 1: Porous macro-cell continued recursively with the cut directions changing
at every level of the tree until the groups of cells at the
leaves are small.
When partitioning at lower levels of the tree, one
2 Overview of the CIRCA algo- should consider the effects of the locations of the neigh-
rithm boring cells on the cut being done. In our method, all
cuts that reside at the same level of the tree with the
The ORCA system consists of a floorplanner, a placer, a same cut direction and on the same horizontal or verti-
global router, a spacing requester, a macro adjuster and cal line are done simultaneously. At each step the cut
an area router. The floorplanner partitions the logic cir- line is defined so that it cuts a sequence of groups that
cuits into groups of ceils and assigns the cehs to areas extend entirely from one side of the layout to the other.
of the layout. The placer places the logic circuits within Figure 2 illustrates the idea; the horizontal cuts a, b,
each layout area. The global router performs a “coarse” c and d are all considered simultaneously. We found
routing of the chip and locates the routing areas that that this method is better than considering partitions
are too congested. The spacing requesler decides which sequentially with a pin propagation scheme [4].
areas of the layout need to be expanded or contracted It has also been noticed that the quality of a cut de-
based on the global routing congestion information. The pends on the parts of the layout whose locations have
placemenf adjus1er readjusts the location and placement already been determined. The cells and i/o pads that
of cell groups to accommodate the requested spacing have already been placed are allowed to exert an in-
changes. The global routing and placement improve- fluence on the cut being performed. At every cut the
ment sequence is continued until it is determined that area of the two groups is balanced, thus at the end of

Paper 8.4
123
-

+
no
qo
a
0
no
C
Macro 1

P
El Macro

rtical cut
Horizpntal cu t
+ - 7 --
d
In
rY Ttical ‘;“1 Macro 4 kacro 3]

Figure 2: Simultaneous min-cut racro 31 11 1 Cut directions changed

the floorplanning stage all of the groups should have 4 b)


approximately the same area.
Since the min-cut algorithm is a heuristic based de- Figure 3: Alternate cut directions for improved place-
terministic algorithm, the result of the algorithm i.s very ment
dependent on the initial starting conditions. A simple
yet, effective method to fix this problem is to perform
the min-cut algorithm on the same cut many times; are processed. In practice, after two or three iterations
each time starting from a different random initial state. the total wire length does not improve further.
When using many random initial starts, the measured To deal with timing constraints, the user can spec-
routing length has been observed to decrease from 10 to ify upper bounds on the lengths of critical nets. At
15 per cent. each stage of the mincut hierarchy, the placement pro-
At this point the relative locations of the cell groups gram estimates the length of critical :nets and adapts
are determined but their exact locations and shapes are its placement procedure to meet the upper bound con-
not. To determine the shape that a particular cell group straints. When the estimated length of a critical net
should take, we first generate all possible shapes that approaches its upper bound, the mine-ut aigorithm at-
it can take then choose one. The possible shapes are tempts to assign cells on the critical net to a side of the
generated by first trying to pack all the cells into a shape mincut partition, such that the upper bound constraint
one row high and as wide as necessary. Then we try is maintained.
to pack it into two rows, and on up until it forms a
shape that is only one column wide and as many rows as
necessary. Using the array of shapes generated for each 4 The routing area estimation
group and a binary tree that represents the sequence of and placement adjustment
cuts made in the floorplanning stage, the locations and
shapes of the groups is then determined using a %hape The floorplanning/placement step does not directly take
adding” approach [5]. The algorithm chooses the shape into account the space needed for wiring. In order to
for each group that minimizes the overall chip area and ensure that the placement produced can be successfully
produces a chip with a desired aspect ratio or a chip routed, we estimate the routing requi.rements using a
with an aspect ratio in a predefined range. simplified, fast but less accurate version of the global
An optimization that can be performed at this stage router used in the area router (discussed in Section 5).
is to introduce flexibility into the directions of the cuts. The global router determines the amount of additional
In situations illustrated in Figure 3, the overall layout space required in the areas which are too congested
area is improved when the cut directions are changed (where the estimated routing usage exceeds the rout-
(Figure 3b). At each appropriate position in the parti- ing capacity). The placement adjustment phase then
tion tree, both cut directions are tried and the results incorporates these requests for space into the layout.
incorporated into the shape adding process. Once the The relative placement defined by the cut tree is not
shape adder has determined the shape of the layout, the altered; instead, the extra space needed is incorporated
best direction of all the cuts in the partition tree c,an be into the appropriate shape function of the cell group in
determined. that locality, and then the “shape adding” phase is pre-
Once each group’s shape is determined, its cells are formed again. The placement adjustment is very fast
placed in a locally optimum configuration by exhaustive since the circuit need not be repartitioned.
enumeration. The objective of the group placement is The global router and placement adjustment pro-
to minimize total wire length. This step is repeated grams iterate until convergence to a design that can
since it is dependent on the order in which the groups be successfully routed at the detailed level. The inter-

Paper 8.4
124
action of the global router and the placer makes the each tile adjacent to the edge. The expandability prop-
layout appear like a large “porous” structure. This is erty is used to identify edges that have variable routing
because area is added in regions of high routing density, capacity which designate variable width channels.
and as a result the wires are allowed to form paths that A routed path of a net n, P,, is a tree in G* that
minimize their length and need not be detoured due to connects the terminals of the net. A routing subpath
routing congestion. P C P, is a connected path whose endpoints in P, are
either terminals or vertices with more than 2 edges.
Given the routing model, the objective of the router
5 Area Router is to find routing paths for each net such that no edge
e E E* has demand(e) > capacity(e).
The general purpose ORCA area router is intended for Approximations to Steiner trees are used for initial
a wide variety of routing problems which includes sea- global routing to minimize total wire length. A greedy
of-gates, macro-cell and gate-array routing. A powerful method [lo] is used to compute the approximation to
feature of this router is that it is a general channelless the Steiner tree. The initial global routing is pre-
router that handles over-the-cell routing and routing of formed without regard to congestion; only interconnec-
2, 3, or more layers of interconnect. The router is given a tion length is minimized. The next phase of global rout-
set of terminals to interconnect and a description of the ing addresses the routing congestion.
blockages, which can be of any shape and on any rout- A rip-up-and-reroute algorithm is used to move rout-
ing layer. The area router assumes a preferred routing ing from highly congested routing areas. Highly con-
direction for each routing layer. gested areas are identified by the edges in the global
The area router is based on the well known short- routing graph which have routing demands that exceed
est path algorithm commonly called the mazerouter [S]. their capacities. The rip-up-and-reroute algorithm se-
The shortest path algorithm is used for computing paths lects the most congested edge in the global wiring graph,
in the global routing, detailed routing, and intermedi- randomly selects a global wire subpath which traverses
ate routing phases. The global routing algorithm se- the edge, and deletes the wire path from the database,
lected was that used at IBM [7] with further enhance- and then tries to find a better wiring path using a short-
ments suggested by Nair [s]. The global routing algo- est path algorithm The mazerouter or minimal cost
rithm uses a shortest path algorithm which penalizes path search algorithm [6] is directed by a cost func-
paths crossing heavily congested areas, and uses rip- tion which assigns high cost to heavily congested edges.
up-and-reroute to reduce order dependency of the ini- The algorithm is greedy. Only wiring paths with lower
tial routing. The global routing results were favorable. wiring costs are accepted. The rip-up-and-rerouting
In the ORCA router, this idea of using a mazerouter is continued until none of the capacities of each global
combined with rip-up-and-reroute has been extended wiring edge are saturated.
down into detailed routing by successively refining the The cost function used in the shortest path algorithm
initial global routing grid into finer and finer grids until is linear, i.e. the cost of a path is the sum of the cost of
each global routing cell becomes a detailed routing grid each component (or edge) in the path. The edge cost is
as in [9]. Thus, we have one simple general algorithm a function of its length penalized when the edge is over
that is used for both global routing and detailed routing. saturated. In mathematical terms, the objective of the
The area routing algorithm uses the classical global path search algorithm is,
routing model. The routing area is physically divided by
a set of horizontal and vertical cut lines into a rectangu- min cost(path) = min c edgecost(
paths in G*
lar matrix of routing tiles. The cut lines are positioned eEpath
such that they coincide with channel boundaries ‘. Each
routing tile is a face in the graph G = (V, E). The rout- edgecost =
ing algorithm uses the dual graph G* = (V”, E*). Each length(e) + f(overflow(e), expandability(e)),
edge e E E* has the following properties: length, rout-
ing capacity, routing demand, and expandability. The
length of an edge e E Ek is the center-to-center distance overflow(e) = demand(e) - capacity(e).
of the tiles corresponding to the endpoints of e. The f(owerfZow(e), expandability(e)) is the penalty func-
routing capacity of an edge of E is maximum number tion for overflowed edges (,e,fZo,(e) > 0) that also
of feasible routes (on all layers of the preferred direc- takes into account the expandability of edge e.
tion) that can cross the corresponding edge in the dual The global wiring model is “refined” by subdividing
graph E. The edge capacity is computed by counting each global routing row and column into smaller sub
the number of routing tracks that span the midpoint of rows and subcolumns. Figure 4 shows the added cut
‘In cases where there are no channels, the cut lines are posi-
lines (shown in dotted lines) in the refined global rout-
tioned as close as possible to edges of the boundaries of instances ing graph. At each iteration of global routing, the re-
of circuits used in the design. finement of the global wiring model will give a more
Permission to copy without fee all orpart of this material is granted provided that the copies are not made or distributed for direct commercial
advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission
of the Association for Computina Machinery. To co~v otherwise. or to republish. reauires a fee and/or suecific oetission. Paper 8.4
125
exact picture of where the net paths should go. At the tially reduced. The cost function penalizes, in order of
very last stage of refinement, each global routing tile highest-to-lowest severity, blockages, crossovers, over-
would correspond to the intersection of one vertical and laps, wrongway wires, and finally vias.
one horizontal wiring track. The nets are then routed in
E:ven though the complexity of the algorithm is kept
the refined global wiring model using the shortest path
low by the windowing scheme, the memory requirements
search algorithm, confined to the path suggested to the
grow geometrically with each refinement of the global
higher level global route, without regard to congestion.
wiring graph. The solution to the memory problem is
to subdivide the routing Iproblem into independent sub-
problems. The routing graph G is partitioned at user
specified horizontal and vertical cut lines (Figure 5).
The problem then becomes the assignment of routing
paths which cross the partition boundaries. New pseudo
terminals are assigned to locations along the boundaries
for these routing paths. Bach pseudo terminal along one
edge e E Enboundary edges is given a termAssignCost
for each possible location along the edge, where the
most “desired” location of a terminal is given the lowest
termtissigncost. Nets which have terminals very near
edge e are given the highest priority termdssigncost.
Figure 4: Ripup and reroute path search windowing The assignment algorithm assigns each pseudo termi-
nal to a location which minimizes the total sum of each
To reduce the complexity of the shortest path search terminal/location cost.
algorithm, the higher level global routing paths are used
The assignment of terminals along partition bound-
as a guide to limit the domain of the lower level path
aries introduce extra artifical constraints that degrade
search algorithm. The path search is limited by the
the detailed routing process. By experiment, we found
wider path suggested by the higher level routing., This
that along these boundaries, we had many wiring vio-
reduces the complexity of the path search algorithm to
lations after detailed wiring. To solve this problem, we
O(length of path) instead of O((Zength of path)2). Fig-
defined new partition boundaries that are offset from
ure 4 shows how a higher level global wire path i;s used
the original boundares, merged the routing across the
as a guide in the refined global routing graph.
original boundaries and rerouted wires along the merged
In the event that overflows still exist, further iter-
boundaries.
ations of rip-up-and-reroute are performed on an en-
larged window extending past the path suggested by To handle three and more layers of routing, the area
the higher level routing. An extra component is added router assigns routing subpaths to HV layer pairs just
to the cost function to guide the mazerouter to prefer before the global routing model is refined to the last
the path suggested by the higher level routing. most detailed stage. The global routing model is dupli-
When the global wiring model is refined to the point cated for each layer, and the capacities of each edge are
where it matches the detailed routing model, the edge updated to reflect only the routing tracks found on its
cost function is changed to influence the mazerouter in layer. Nets are routed one at a time in the multi-layer
chasing vias, wrong-way-wires, and overlaps. As in the routing model in a first come first serve manner: the
coarse routing model, an overflow in the detailed rout- lowest HV layer pair is filled first, then the next HV
ing phase is present when the capacity along an edge layer pair, and so on.
exceeds the demand. However in the detailed rout-
ing model, the capacity along an edge is either one or
zero, and when an overflow occurs this means a wire
going through a blocked region (i.e. capacity = 0 and
demand >= 1) or a wire shorting another wire (i.e.
capacity = 1 and demand >= 2). The cost func-
tion used in the detailed routing phase distinguishes be-
tween these two cases, and prefers the latter. Also there
are two types of overflows, “crossovers” and “overlaps”.
The crossovers occur when a net shorts another net by
going from one side of the shorted net to the other. An
overlap is defined when a net shorts another net but
is not a crossover. By experimental analysis, we found
Figure 5: Hierarchical decomposition
that when we allow overlaps rather than crossovers, the
number of overflows after rip-up-and-reroute is substan-

Paper 8.4
126
6 Results total net length
The ORCA system has been implemented in the C
under the UNIX operating system. It is integrated
with the Berkeley CAD Design Environment [ll]. The
program was run on a variety of examples shown in
Table 1. The randomly generated random1 example
was the most difficult to route due to high connec- Table 2: Results on mcnc2 gate array example
tions/cell ratio. The mcncl and mcnc2 test cases are
the gate array benchmark examples from the 1988 Inter-
national Placement-and-Routing Workshop at the Mi- Christopher for providing some of the sea-of-gates ex-
croelectronics Center at North Carolina (MCNC). The amples.
hughesl and hughes2 examples are from industrial sea
of-gates designs. The successful routing of the hughes2
example, with close to 20,000 cells and over 20,000 nets, References
demonstrates the feasibility of the area router to handle
A. Hui et al. A 4.lk gates double metal hcmos
large problems. The smaller examples were routed with
sea of gates array. Proc. IEEE CICC, :15-17, May
3 levels of global routing refinement hierarchies, and the
1985.
large hughes2 was routed with 4.
Direct comparisons with other sea-of-gates systems PI B. W. Kernighan and S. Lin. An efficient heuris-
could not be made for any of the examples. However, tic procedure for partitioning graphs. Bell Systems
we could make comparisons of different systems on the Technical Journal, 49:291-307, Feb 1970.
standard gate array benchmark examples mcncl and
mcnc2 from the MCNC workshop. Table 2 compares t31 C. M. Fiduccia and R. M. Mattheyses. A linear-
different place and route systems by the total net length time heuristic for improving network partitions.
of the routing of the mend example 2. Even though P~oc. 19th Design Automation Conference, 1982.
there is a slight variation in the performance (measured
[41 A. E. Dunlop and B. W. Kernighan, A procedure
by total net length) of the place and route systems, all for Placement of Standard VLSI Circuits. IEEE
the systems produced the desired result: a fully routed Bans. on Computer-Aided Design, CAD-492-98,
gate array chip. This comparison demonstrates that 1985.
ORCA can do a reasonable job on standard row-based
gate arrays as well as handle se&of-gates. PI L. Stockmeyer. Optimal orientations of cells in
slicing floorplan designs. Information and Control,
57(3):91-101, June 1983.
name #nets #cells #layers
[61 C. Lee. An algorithm for path connection and its
applications. IRE Bans. Electronic Computers,
EC-10:346-365, 1961.

[71 K. H. Khokhani N. Nan K. A. Chen, M. Feuer and


S. Schmidt. The chip layout problem: an auto-
matic wiring procedure. Proc. l&h Design Au-
tomation Comf., :298-302, 1977.

PI Ravi Nair. A simple yet effective technique for


Table 1: Routing results global wiring. IEEE Tkans. on Computer-Aided
Design, CAD-6:165-172, 1987.

PI M. Burstein and R. Pelavin. Hierarchical wire


routing. IEEE Bans. on Computer-Aided Design,
7 Acknowledgements CAD-2:223-234, 1983.

The authors would like to express thanks to Alan [lOI Jr. J. B. Kruskal. On the shortest spanning sub-
Kramer, Gregory Sorkin and Amit Sharma for their tree of a graph and the traveling salesman problem.
contributions to the ORCA system. The authors would Proc. Amer. Math. SOL, 7:48-50, 1956.
also like to thank Chi Ping Hsu from Hughes and Wayne
[ill D. Harrison et al. Data management and graphics
‘The results of the mcncl gate array example were omitted editing in the Berkeley design environment. Proc.
since it was much easier to route than the larger mcnc2 example. ICCAD, Nov 1986.
3 VAX 8650 cpu minutes

Paper 8.4
127

You might also like