ORCA
ORCA
ORCA
all A
ui
A
A ,I_ 1 the cut are constrained to lie inside respective halves of
the layout space. The two resulting pieces will then be
partitioned with a cut line perpendicular
The floorplanning
to the first.
hierarchy can now be represented as
a tree with the leaf nodes being groups of cells and the
internal nodes representing network cuts. The cuts are
Figure 1: Porous macro-cell continued recursively with the cut directions changing
at every level of the tree until the groups of cells at the
leaves are small.
When partitioning at lower levels of the tree, one
2 Overview of the CIRCA algo- should consider the effects of the locations of the neigh-
rithm boring cells on the cut being done. In our method, all
cuts that reside at the same level of the tree with the
The ORCA system consists of a floorplanner, a placer, a same cut direction and on the same horizontal or verti-
global router, a spacing requester, a macro adjuster and cal line are done simultaneously. At each step the cut
an area router. The floorplanner partitions the logic cir- line is defined so that it cuts a sequence of groups that
cuits into groups of ceils and assigns the cehs to areas extend entirely from one side of the layout to the other.
of the layout. The placer places the logic circuits within Figure 2 illustrates the idea; the horizontal cuts a, b,
each layout area. The global router performs a “coarse” c and d are all considered simultaneously. We found
routing of the chip and locates the routing areas that that this method is better than considering partitions
are too congested. The spacing requesler decides which sequentially with a pin propagation scheme [4].
areas of the layout need to be expanded or contracted It has also been noticed that the quality of a cut de-
based on the global routing congestion information. The pends on the parts of the layout whose locations have
placemenf adjus1er readjusts the location and placement already been determined. The cells and i/o pads that
of cell groups to accommodate the requested spacing have already been placed are allowed to exert an in-
changes. The global routing and placement improve- fluence on the cut being performed. At every cut the
ment sequence is continued until it is determined that area of the two groups is balanced, thus at the end of
Paper 8.4
123
-
+
no
qo
a
0
no
C
Macro 1
P
El Macro
rtical cut
Horizpntal cu t
+ - 7 --
d
In
rY Ttical ‘;“1 Macro 4 kacro 3]
Paper 8.4
124
action of the global router and the placer makes the each tile adjacent to the edge. The expandability prop-
layout appear like a large “porous” structure. This is erty is used to identify edges that have variable routing
because area is added in regions of high routing density, capacity which designate variable width channels.
and as a result the wires are allowed to form paths that A routed path of a net n, P,, is a tree in G* that
minimize their length and need not be detoured due to connects the terminals of the net. A routing subpath
routing congestion. P C P, is a connected path whose endpoints in P, are
either terminals or vertices with more than 2 edges.
Given the routing model, the objective of the router
5 Area Router is to find routing paths for each net such that no edge
e E E* has demand(e) > capacity(e).
The general purpose ORCA area router is intended for Approximations to Steiner trees are used for initial
a wide variety of routing problems which includes sea- global routing to minimize total wire length. A greedy
of-gates, macro-cell and gate-array routing. A powerful method [lo] is used to compute the approximation to
feature of this router is that it is a general channelless the Steiner tree. The initial global routing is pre-
router that handles over-the-cell routing and routing of formed without regard to congestion; only interconnec-
2, 3, or more layers of interconnect. The router is given a tion length is minimized. The next phase of global rout-
set of terminals to interconnect and a description of the ing addresses the routing congestion.
blockages, which can be of any shape and on any rout- A rip-up-and-reroute algorithm is used to move rout-
ing layer. The area router assumes a preferred routing ing from highly congested routing areas. Highly con-
direction for each routing layer. gested areas are identified by the edges in the global
The area router is based on the well known short- routing graph which have routing demands that exceed
est path algorithm commonly called the mazerouter [S]. their capacities. The rip-up-and-reroute algorithm se-
The shortest path algorithm is used for computing paths lects the most congested edge in the global wiring graph,
in the global routing, detailed routing, and intermedi- randomly selects a global wire subpath which traverses
ate routing phases. The global routing algorithm se- the edge, and deletes the wire path from the database,
lected was that used at IBM [7] with further enhance- and then tries to find a better wiring path using a short-
ments suggested by Nair [s]. The global routing algo- est path algorithm The mazerouter or minimal cost
rithm uses a shortest path algorithm which penalizes path search algorithm [6] is directed by a cost func-
paths crossing heavily congested areas, and uses rip- tion which assigns high cost to heavily congested edges.
up-and-reroute to reduce order dependency of the ini- The algorithm is greedy. Only wiring paths with lower
tial routing. The global routing results were favorable. wiring costs are accepted. The rip-up-and-rerouting
In the ORCA router, this idea of using a mazerouter is continued until none of the capacities of each global
combined with rip-up-and-reroute has been extended wiring edge are saturated.
down into detailed routing by successively refining the The cost function used in the shortest path algorithm
initial global routing grid into finer and finer grids until is linear, i.e. the cost of a path is the sum of the cost of
each global routing cell becomes a detailed routing grid each component (or edge) in the path. The edge cost is
as in [9]. Thus, we have one simple general algorithm a function of its length penalized when the edge is over
that is used for both global routing and detailed routing. saturated. In mathematical terms, the objective of the
The area routing algorithm uses the classical global path search algorithm is,
routing model. The routing area is physically divided by
a set of horizontal and vertical cut lines into a rectangu- min cost(path) = min c edgecost(
paths in G*
lar matrix of routing tiles. The cut lines are positioned eEpath
such that they coincide with channel boundaries ‘. Each
routing tile is a face in the graph G = (V, E). The rout- edgecost =
ing algorithm uses the dual graph G* = (V”, E*). Each length(e) + f(overflow(e), expandability(e)),
edge e E E* has the following properties: length, rout-
ing capacity, routing demand, and expandability. The
length of an edge e E Ek is the center-to-center distance overflow(e) = demand(e) - capacity(e).
of the tiles corresponding to the endpoints of e. The f(owerfZow(e), expandability(e)) is the penalty func-
routing capacity of an edge of E is maximum number tion for overflowed edges (,e,fZo,(e) > 0) that also
of feasible routes (on all layers of the preferred direc- takes into account the expandability of edge e.
tion) that can cross the corresponding edge in the dual The global wiring model is “refined” by subdividing
graph E. The edge capacity is computed by counting each global routing row and column into smaller sub
the number of routing tracks that span the midpoint of rows and subcolumns. Figure 4 shows the added cut
‘In cases where there are no channels, the cut lines are posi-
lines (shown in dotted lines) in the refined global rout-
tioned as close as possible to edges of the boundaries of instances ing graph. At each iteration of global routing, the re-
of circuits used in the design. finement of the global wiring model will give a more
Permission to copy without fee all orpart of this material is granted provided that the copies are not made or distributed for direct commercial
advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission
of the Association for Computina Machinery. To co~v otherwise. or to republish. reauires a fee and/or suecific oetission. Paper 8.4
125
exact picture of where the net paths should go. At the tially reduced. The cost function penalizes, in order of
very last stage of refinement, each global routing tile highest-to-lowest severity, blockages, crossovers, over-
would correspond to the intersection of one vertical and laps, wrongway wires, and finally vias.
one horizontal wiring track. The nets are then routed in
E:ven though the complexity of the algorithm is kept
the refined global wiring model using the shortest path
low by the windowing scheme, the memory requirements
search algorithm, confined to the path suggested to the
grow geometrically with each refinement of the global
higher level global route, without regard to congestion.
wiring graph. The solution to the memory problem is
to subdivide the routing Iproblem into independent sub-
problems. The routing graph G is partitioned at user
specified horizontal and vertical cut lines (Figure 5).
The problem then becomes the assignment of routing
paths which cross the partition boundaries. New pseudo
terminals are assigned to locations along the boundaries
for these routing paths. Bach pseudo terminal along one
edge e E Enboundary edges is given a termAssignCost
for each possible location along the edge, where the
most “desired” location of a terminal is given the lowest
termtissigncost. Nets which have terminals very near
edge e are given the highest priority termdssigncost.
Figure 4: Ripup and reroute path search windowing The assignment algorithm assigns each pseudo termi-
nal to a location which minimizes the total sum of each
To reduce the complexity of the shortest path search terminal/location cost.
algorithm, the higher level global routing paths are used
The assignment of terminals along partition bound-
as a guide to limit the domain of the lower level path
aries introduce extra artifical constraints that degrade
search algorithm. The path search is limited by the
the detailed routing process. By experiment, we found
wider path suggested by the higher level routing., This
that along these boundaries, we had many wiring vio-
reduces the complexity of the path search algorithm to
lations after detailed wiring. To solve this problem, we
O(length of path) instead of O((Zength of path)2). Fig-
defined new partition boundaries that are offset from
ure 4 shows how a higher level global wire path i;s used
the original boundares, merged the routing across the
as a guide in the refined global routing graph.
original boundaries and rerouted wires along the merged
In the event that overflows still exist, further iter-
boundaries.
ations of rip-up-and-reroute are performed on an en-
larged window extending past the path suggested by To handle three and more layers of routing, the area
the higher level routing. An extra component is added router assigns routing subpaths to HV layer pairs just
to the cost function to guide the mazerouter to prefer before the global routing model is refined to the last
the path suggested by the higher level routing. most detailed stage. The global routing model is dupli-
When the global wiring model is refined to the point cated for each layer, and the capacities of each edge are
where it matches the detailed routing model, the edge updated to reflect only the routing tracks found on its
cost function is changed to influence the mazerouter in layer. Nets are routed one at a time in the multi-layer
chasing vias, wrong-way-wires, and overlaps. As in the routing model in a first come first serve manner: the
coarse routing model, an overflow in the detailed rout- lowest HV layer pair is filled first, then the next HV
ing phase is present when the capacity along an edge layer pair, and so on.
exceeds the demand. However in the detailed rout-
ing model, the capacity along an edge is either one or
zero, and when an overflow occurs this means a wire
going through a blocked region (i.e. capacity = 0 and
demand >= 1) or a wire shorting another wire (i.e.
capacity = 1 and demand >= 2). The cost func-
tion used in the detailed routing phase distinguishes be-
tween these two cases, and prefers the latter. Also there
are two types of overflows, “crossovers” and “overlaps”.
The crossovers occur when a net shorts another net by
going from one side of the shorted net to the other. An
overlap is defined when a net shorts another net but
is not a crossover. By experimental analysis, we found
Figure 5: Hierarchical decomposition
that when we allow overlaps rather than crossovers, the
number of overflows after rip-up-and-reroute is substan-
Paper 8.4
126
6 Results total net length
The ORCA system has been implemented in the C
under the UNIX operating system. It is integrated
with the Berkeley CAD Design Environment [ll]. The
program was run on a variety of examples shown in
Table 1. The randomly generated random1 example
was the most difficult to route due to high connec- Table 2: Results on mcnc2 gate array example
tions/cell ratio. The mcncl and mcnc2 test cases are
the gate array benchmark examples from the 1988 Inter-
national Placement-and-Routing Workshop at the Mi- Christopher for providing some of the sea-of-gates ex-
croelectronics Center at North Carolina (MCNC). The amples.
hughesl and hughes2 examples are from industrial sea
of-gates designs. The successful routing of the hughes2
example, with close to 20,000 cells and over 20,000 nets, References
demonstrates the feasibility of the area router to handle
A. Hui et al. A 4.lk gates double metal hcmos
large problems. The smaller examples were routed with
sea of gates array. Proc. IEEE CICC, :15-17, May
3 levels of global routing refinement hierarchies, and the
1985.
large hughes2 was routed with 4.
Direct comparisons with other sea-of-gates systems PI B. W. Kernighan and S. Lin. An efficient heuris-
could not be made for any of the examples. However, tic procedure for partitioning graphs. Bell Systems
we could make comparisons of different systems on the Technical Journal, 49:291-307, Feb 1970.
standard gate array benchmark examples mcncl and
mcnc2 from the MCNC workshop. Table 2 compares t31 C. M. Fiduccia and R. M. Mattheyses. A linear-
different place and route systems by the total net length time heuristic for improving network partitions.
of the routing of the mend example 2. Even though P~oc. 19th Design Automation Conference, 1982.
there is a slight variation in the performance (measured
[41 A. E. Dunlop and B. W. Kernighan, A procedure
by total net length) of the place and route systems, all for Placement of Standard VLSI Circuits. IEEE
the systems produced the desired result: a fully routed Bans. on Computer-Aided Design, CAD-492-98,
gate array chip. This comparison demonstrates that 1985.
ORCA can do a reasonable job on standard row-based
gate arrays as well as handle se&of-gates. PI L. Stockmeyer. Optimal orientations of cells in
slicing floorplan designs. Information and Control,
57(3):91-101, June 1983.
name #nets #cells #layers
[61 C. Lee. An algorithm for path connection and its
applications. IRE Bans. Electronic Computers,
EC-10:346-365, 1961.
The authors would like to express thanks to Alan [lOI Jr. J. B. Kruskal. On the shortest spanning sub-
Kramer, Gregory Sorkin and Amit Sharma for their tree of a graph and the traveling salesman problem.
contributions to the ORCA system. The authors would Proc. Amer. Math. SOL, 7:48-50, 1956.
also like to thank Chi Ping Hsu from Hughes and Wayne
[ill D. Harrison et al. Data management and graphics
‘The results of the mcncl gate array example were omitted editing in the Berkeley design environment. Proc.
since it was much easier to route than the larger mcnc2 example. ICCAD, Nov 1986.
3 VAX 8650 cpu minutes
Paper 8.4
127