Research Article: Efficient Congestion Mitigation Using Congestion-Aware Steiner Trees and Network Coding Topologies

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Hindawi Publishing Corporation VLSI Design Volume 2011, Article ID 892310, 9 pages doi:10.

1155/2011/892310

Research Article Efcient Congestion Mitigation Using Congestion-Aware Steiner Trees and Network Coding Topologies
M. A. R. Chaudhry, Z. Asad, A. Sprintson, and J. Hu
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA Correspondence should be addressed to M. A. R. Chaudhry, [email protected] Received 26 December 2010; Accepted 5 February 2011 Academic Editor: Zhuo Li Copyright 2011 M. A. R. Chaudhry et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In the advent of smaller devices, a signicant increase in the density of on-chip components has raised congestion and overow as critical issues in VLSI physical design automation. In this paper, we present novel techniques for reducing congestion and minimizing overows. Our methods are based on ripping up nets that go through the congested areas and replacing them with congestion-aware topologies. Our contributions can be summarized as follows. First, we present several ecient algorithms for nding congestion-aware Steiner trees that is, trees that avoid congested areas of the chip. Next, we show that the novel technique of network coding can lead to further improvements in routability, reduction of congestion, and overow avoidance. Finally, we present an algorithm for identifying ecient congestion-aware network coding topologies. We evaluate the performance of the proposed algorithms through extensive simulations.

1. Introduction
In almost any VLSI design ow, global routing is an essential stage that determines the signal interconnections. Therefore, the capability of the global router may signicantly aect the design turn-around time. Moreover, the results of the global routing stage impact many circuit characteristics, such as power, timing, area, and signal integrity. Global routing poses major challenges in terms of the ecient computation of quality routes. In fact, most of the global routing problems, even special cases, tend to be NP complete [1, 2]. In the advent of smaller devices, a signicant increase in the density of on-chip components results in a larger number of nets that need to be routed, which, together with more stringent routing constraints, results in increasing congestion and overow. In this paper, we propose novel techniques for congestion avoidance and overow reduction. Our methods are designed for the rip-up-and-reroute phase of the global routing stage. At this stage, all the nets have already been routed using a standard prerouting technique however some of the nets need to be rerouted due to high congestion and overow. Our methods are based on ripping up nets that go through congested areas and replacing them with

congestion-aware topologies. The proposed techniques facilitate even distribution of the routing load along the available routing areas. We propose ecient algorithms for nding congestion-aware Steiner trees that favor uncongested routes. In addition, we use the novel technique of network coding for further reduction of congestion and overow avoidance. 1.1. Congestion-Aware Steiner Trees. The major goal of congestion-aware Steiner tree routing is to nd a tree that connects the required set of nodes (pins of a net) while avoiding congested areas with a minimum penalty in terms of the total wirelength. In addition, the running time of the routing algorithm should scale well with the growing number of nets. These requirements pose several challenges in terms of the algorithm design. The rst challenge is to select a cost function that adequately captures the local congestion conditions at the edges of the routing graph. Next, the algorithm should nd a minimum cost tree within acceptable running time. Since nding a Steiner tree is an NP-complete problem, the algorithm needs to use an approximation scheme or employ a heuristic approach. Finally, the proposed algorithm should ensure that the overall performance of the rip-up-and-reroute phase is satisfactory

2
s1 a t1 a s2 a b s1 a a t1 b b

VLSI Design
s2 b

s1

t1

s2

s1 a

t1 a b

s2

t2 (a)

t3

t2

a (b)

t3

t2

b (c)

t3

t2

b (d)

t3

Figure 1: (a) The underlying routing graph with two source nodes s1 , and s2 and three terminals t1 , t2 , t3 . (b) A rectilinear Steiner tree that connects source s1 to all terminals. (c) Two rectilinear Steiner trees that connect sources s1 and s2 to all terminals. (d) A network coding solution.

in terms of congestion mitigation and overow reduction. In this paper, we evaluate several cost functions which take into account various factors such as wire density, overow, and congestion history. We propose several ecient algorithms for Steiner tree routing and compare their performance. Our algorithms are based on known approximations for the Steiner tree problem, heuristic methods, and articial intelligence techniques. 1.2. Network Coding. The basic idea of the network coding technique [3] is to enable the intermediate nodes to generate new signals by combining the signals arriving over their incoming wires. This is in contrast to the standard approach, in which each node can only forward or duplicate the incoming signals. For example, consider a routing instance depicted in Figure 1(a). In this example, we need to route two nets, one connecting source s1 with terminals t1 , t2 , and t3 and the other connecting source s2 with the same set of terminals. The underlying routing graph is represented by a grid as shown in Figure 1(a). Suppose that due to congestion, each edge of this graph has a residual capacity of one unit, that is, each edge can accommodate only a single wire. It is easy to verify that using traditional Steiner tree routing, only one net can be routed without an overow. For example, Figure 1(b) shows a possible routing of a net that connects s1 with terminals t1 , t2 , and t3 . In contrast, Figure 1(c) shows that routing of both nets results in an overow. In this example, two nets transmit dierent signals, a and b, over separate Steiner trees. Figure 1(d) shows that the network coding approach allows to route both nets without overows. With this approach, the terminal t1 creates a new signal, a b, which is delivered to terminals t2 and t3 , while the signals a and b are delivered to terminals t2 , and t3 directly. It is easy to verify that each terminal can decode the two original symbols a and b. The network coding technique oers two distinct advantages. First, it has a potential of solving dicult cases that cannot be solved using traditional routing. For example, for the routing instance shown in Figure 1, the traditional routing results in an overow of value 1, whereas with the network coding technique there are no overows. Second, the network coding technique can decrease the total wirelength. For example, in the routing instance shown in Figure 1 the total wirelength for the traditional routing

solution is 8, whereas for the network coding solution, the total wirelength is 7. 1.3. Previous Work. In the past decades, researchers have strived to improve the performance of global routing algorithms (see, e.g., [46] and references therein). To handle the complexity of large-scale global routing, multilevel routing techniques are proposed in [7, 8]. Recently proposed BoxRouter [9, 10] is based on progressive integer linear Programming (ILP) and rip-up-and-reroute techniques. A fast routing method is presented in [11]. Reference [12] proposes an approach based on the Lagrangian multiplier technique. An eective edge shifting technique is presented in [13]. Most of these previous works adopt the rip-upand-reroute strategy. However, they usually reroute one path (i.e., a 2-pin connection) at a time. In contrast, our method reroutes entire multipin nets. We also propose to use network coding techniques to further reduce congestion and eliminate overows. 1.4. Contribution. The paper makes the following contributions. First, we propose several algorithms for nding ecient congestion-aware Steiner trees. Second, we show that the novel technique of network coding can lead to further improvements in routability, reduction of congestion, and overow minimization. Finally, we provide an algorithm for identifying ecient congestion-aware network coding topologies. We evaluate the performance of the proposed algorithms through extensive simulations.

2. Model
In this paper, we adopt the most commonly used global routing model. The routing topology is represented by a grid graph G(V , E), where each node v V corresponds to a global routing cell (GCell) [10, 12] and each routing edge e E corresponds to a boundary between two adjacent GCells. A set S = {n1 , n2 , . . . , n|S| } of nets are to be routed on this graph. Each net ni S connects a source node si with terminal nodes Ti = {t1 , t2 , . . . , t|Ti | }. If there is a wire connection between two adjacent GCells, the wire must cross their boundary and utilize the corresponding routing edge. Each routing edge e E has a certain routing capacity c(e) which determines the number of wires that can pass through

VLSI Design this edge. We denote by (e) the number of wires that are currently using edge e. 2.1. Global Routing Metric. The goal of a global router is to minimize congestion. Some of the important metrics for a global router are dened as follows. (i) Overow. For each edge e E, the overow ov(e) of e is dened as ov(e) =
(e) c(e), 0,

3 History-Based Cost Function. This cost function assigns cost to an edge based on its congestion history [10, 12]. Specically, each edge is associated with a parameter he that species the number of times the edge has been overowed during the previous iterations of the algorithm. That is, each time the edge with an overow is used, the parameter he is incremented by one. Then, the modied cost (e) of the edge is dened as follows: (e) = 1 + he (e). (8)

if (e) > c(e), (1) otherwise.

The maximum overow ovmax is dened as ovmax = max ov(e).


eE

(2)

Total overow ovtot is dened as ovtot =


eE

ov(e).

(3)

(ii) Wirelength. Total wirelength wlen is dened as wlen =


eE

(e).

(4)

(iii) Density. The density d(e) of edge e E is dened as d(e) = (e) . c(e) (5)

2.2. Cost Functions. Our algorithms associate each edge e E in the graph with a cost function (e) which captures its congestion and overow. The cost of the tree is dened as the sum of the costs of all of its edges. Our goal is to identify trees that go through congested areas and replace them by Steiner trees or network coding topologies that go through areas with low congestion. In this work, we consider several cost functions, described below. Polynomial Cost Function. We propose a cost assignment function, where the cost of an overowed edge is a polynomial function of the sum of its density and overow. Formally, our proposed cost function is dened as follows: (e) = (d(e) + ov(e)) , (6)

Here, (e) is either the polynomial cost function (6) or exponential cost function (7). If the density of the edge is less than or equal to one, the parameter he is initially set to zero. Since we focus on the rerouting phase, we assume that for each net ni S, there exists a Steiner tree i which connects all nodes in ni . Given a set of trees { | ni S}, we can determine the values of ov(e), (e) for each edge e E and identify the set of congested nets S S. A net ni S is referred to as congested if its Steiner tree i has at least one edge with overow. We propose a two-phase solution for rerouting congested nets using congestion-aware topologies. In the rst phase we iteratively rip up each net ni S and reroute it using a congestion-aware Steiner tree with the goal of minimizing the maximum overow ovmax and the total overow ovtot . In second phase, we deal with the nets that remain congested at the end of the rst phase and rip-up-and-reroute pairs of congested nets using congestion-aware network coding topologies to further reduce congestion and minimize the number of overows. Note that the nets considered in phase two correspond to dicult cases, where congestion avoidance was not possible even after several attempts of ripping up and rerouting individual nets. Therefore in the second phase, we consider the pairs of congested nets for further improvement. The example given in Figure 1 shows the advantage of using network coding topologies for routing pairs of nets over using standard routing techniques that handle each net separately.

3. Congestion-Aware Steiner Trees


In this section, we present several techniques for nding congestion-aware Steiner trees. Our goal is to nd Steiner trees that minimize congestion with a minimum penalty in terms of the overall wirelength. We would like to achieve better tradeos between congestion mitigation and total wirelength. These tradeos are useful for practical applications as in some cases, congestion mitigation is preferable to wirelength reduction, whereas in other cases, the wirelength reduction is of higher priority. 3.1. Previous Work on Steiner Tree Routing. The Steiner tree problem is a well-studied NP-complete problem [14]. There is a wealth of heuristic and approximate solutions that have been developed for this problem. The best-known approximation algorithm has an approximation ratio of 1.55 (i.e., the cost of the tree returned by the algorithm is less than 1.55 times the optimum) [15]. The best known

where is a constant which determines the relative penalty for the congested edges. Exponential Cost Function. We use the cost assignment function proposed by [12]. With this cost assignment, the cost of an edge is an exponential function of its density (e) = d(e)
Exp (d(e) 1)

if d(e) > 1, (7) otherwise,

where is a constant which determines the penalty for overowed edges.

4 approximations require signicant computation time, so we focus on computationally feasible and easy to implement approximation and heuristic solution for constructing Steiner trees. 3.2. Algorithms for Finding Congestion-Aware Trees. As mentioned above, our goal is to rip up and reroute nets that use congested edges of G(V , E). For each net ni S which has been ripped up, we need to nd an alternative Steiner tree that uses uncongested routes. In this section, we describe ve algorithms for nding congestion-aware Steiner trees. The rst three algorithms use combinatorial techniques (see, e.g., [1, 16, 17]) while, the last two are based on the intelligent search techniques [18]. The performance of the algorithms is evaluated in Section 5. Algorithm stTree1. This algorithm approximates a minimum cost Steiner tree by using a shortest path tree. A shortest path tree is a union of the shortest paths between source si and a set of terminals Ti . A shortest path tree can be identied by a single invocation of Dijkstras algorithm. However, the cost of the tree may be signicantly higher than the optimum. Algorithm stTree2. This algorithm constructs the tree in an iterative manner. We iteratively process the terminals in Ti in the increasing order of their distance form si . More specically, we rst nd a shortest path P1 between source si and terminal t1 . Then, we assign a zero cost to all edges that belong to P1 and nd a shortest path P2 between s and t2 with respect to modied costs. The idea behind this algorithm is to encourage sharing of the edges between dierent paths. That is, if an edge e belongs to P1 , it can be used in P2 with no additional cost. In general, when nding a shortest path to terminal t, all edges that belong to paths of previously processed terminals are assigned a zero cost. This algorithm requires |T | 1 iterations of Dijkstras algorithm, but it typically returns a lower-cost tree than Algorithm stTree1. Algorithm stTree3. This is a standard approximation algorithm with the approximation ratio of 2 (i.e., the cost of the tree returned by the algorithm is at most two times higher than the optimal cost). Specically, with this algorithm, we nd a shortest path between each pair of nodes in the set T = {T {si }}. Then, we construct a complete graph G such that each node in G corresponds to a node in T. The weight of an edge e G is equal to the minimum length of the path between two corresponding nodes in T. The algorithm then nds a minimum spanning tree in G . Next, each edge in is substituted by the corresponding shortest path in G, which results (after removing redundant edges) in a Steiner tree in G that connects source si with terminals in Ti . Algorithm stTree4. Algorithm stTree4 is an intelligent search-based algorithm. Our approach is inspired by Algorithm A . Algorithm A is a shortest path algorithm that uses a heuristic function (v) to determine the order of visiting nodes of the graph in order to improve its running time. Specically, for each node v, we dene (v) to be the

VLSI Design maximum distance between node v and a terminal t Ti which has not yet been visited. The distance between v and t Ti is dened as the minimum number of hops that separate v and t in G. The Algorithm stTree4 follows the same steps as Algorithm stTree1, but it uses Algorithm A with heuristic function (v) to nd shortest paths. Algorithm stTree5. Algorithm stTree5 is also based on Algorithm A . It follows the same steps as Algorithm stTree2, but it uses Algorithm A with the same heuristic function (v) as in Algorithm stTree4.

4. Network Coding Techniques


In this section, we use the network coding techniques in order to achieve further improvement in terms of minimizing congestion and reducing the number of overows. The network coding technique enables, under certain conditions, to share edges that belong to dierent nets. For example, in the graph depicted in Figure 1(c), there are two minimum Steiner trees one transmitting signal a from source s1 and the second transmitting signal b from source s2 . These two trees clash at the middle edge (emanating from t1 ), resulting in an overow. This conict can be resolved by coding at node t1 , which eectively allows two trees to share certain edges. Similarly, our algorithm will identify pairs of nets that share terminals and then apply network coding techniques to reduce overow. 4.1. Previous Work on Network Coding. The problem of routing of multiple nets with shared terminals is related to the problem of establishing ecient multicast connections in communication networks. The network coding technique was proposed in a seminal paper by Ahlswede et al. [3]. It was shown in [3, 19] that the capacity of the multicast networks, that is, the number of packets that can be sent simultaneously from the source node s to all terminals, is equal to the minimum size of a cut that separates s from each terminal. Li et al. [19] proved that linear network codes are sucient for achieving the capacity of the network. In a subsequent work, Koetter and M dard [20] developed an algebraic framework e for network coding. This framework was used by Ho et al. [21] to show that linear network codes can be eciently constructed through a randomized algorithm. Jaggi et al. [22] proposed a deterministic polynomial-time algorithm for nding feasible network codes in multicast networks. An initial study of applicability of network coding for improving the routability of VLSI designs appears in [23]. In [24], Gulati and Khatri used network coding for improving the routability of FPGAs, focusing on nding nets that are suitable for network coding. To the best of our knowledge, this is the rst work that proposes ecient algorithms for nding the congestion-aware network coding topologies for VLSI circuits. 4.2. Network Coding Algorithm. We proceed to present the algorithm we use for constructing congestion-aware coding networks that reduce congestion and overow.

VLSI Design

Algorithm NC (G, si , s j , Ti j , Ti , T j ): (1) Find a Steiner tree i connects si to terminals in Ti (2) Find a Steiner tree j connects s j to terminals in T j (3) For each link e Ti T j update (e) (4) Find a Steiner tree that connects si to terminals in Ti j (5) Assign zero cost to all edges in (6) For each terminal t Ti j let d(t) be the shortest distance between s j and t (7) For all terminals t Ti j in the increasing order of d(t) do: (8) Let G (V , E ) be a graph formed from G(V , E) by reversing all edges in Pi,t , where Pi,t is a path from si to t in (9) Find shortest path P j,t from source s j to terminal t in G (V , E ) (10) Assign zero cost to all edges of P j,t (11) For each edge e(v, u) P j,t do (12) If there exists an edge e (u, v) , remove e (u, v) from (13) Otherwise, add e(v, u) to (14) Return , i , and j Algorithm 1: Algorithm NC.

The algorithm includes the following steps. First, we identify the subset S of S that includes nets that go through edges with overow. Second, we identify pairs of nets in S that share at least three common terminals. Next, we check, for each such pair of nets (ni , n j ), whether we can replace the Steiner trees for ni and n j by a more ecient routing topology with respect to congestion and overow. More specically, let (ni , n j ) be a pair of nets in S that share at least three terminals. Let si and s j be the source nodes of these nets. We denote the set of terminals shared by ni and n j by Ti j . We also denote by Ti the set of terminals in Ti that do not belong to Ti j that is, Ti = Ti \ Ti j . Similarly, we denote by T j the set of terminals in T j that do not belong to Ti j that is, T j = T j \ Ti j . Next, we nd two congestion-aware Steiner trees i and j that connect si to Ti and s j to T j . These trees can be identied by one of the algorithms presented in Section 3. The parameter (e) for each e G is updated after nding i and j . Finally, we nd a congestion-aware network coding topology that connects si and s j to the common set of terminals Ti j in an iterative manner. First, we let be a Steiner tree with source si and terminals Ti j . All edges of are always assigned zero cost. We then sort the terminals Ti j in the increasing order of their distance (in the original graph) from s j and process them in that order. For each terminal t Ti j , we reverse all the edges in the path Pi,t between source si and terminal t and nd a shortest path P j,t between source s j and terminal t. Then, for each link e(v, u) P j,t we perform the following procedure. If there exists a link e (u, v) in , we remove e (u, v) from otherwise, we add e(v, u) to . A sample execution of this procedure is shown in Figure 2. It is easy to verify that the algorithm produces a feasible network coding topology that is, a topology that ensures that for each terminal t Ti j there are two-edge disjoint paths that connect si and s j with t. The formal description of algorithm for identifying the network coding topology, referred to as Algorithm NC, is given in Algorithm 1.

After the execution of the algorithm, we determine whether the total cost of , i , and j is less than the total cost of the original Steiner trees for nets ni and n j . If there is a reduction in terms of cost, the two original Steiner trees are replaced by , i , and j . Our experimental results, presented in Section 5, show that the number of coding opportunities is relatively small. However, by applying the network coding technique on a limited number of nets, we can achieve a signicant reduction in the number of overows. Also, since the network coding technique is applied to a limited number of nets, the overhead in terms of the number of additional required gates is relatively small.

5. Performance Evaluation
We have evaluated the performance of our algorithms using the ISPD98 routing benchmarks [25]. All the experiments are performed on a 3.2 GHz Intel Xeon dual-core machine. In all experiments, we rst run the Steiner tree tool Flute [26] in order to determine the initial routing of all nets in the benchmark. Next, we perform an iterative procedure, referred to as Phase 1, which processes each net with overows and checks whether an alternative Steiner tree of lower cost and with smaller number of overows exists, and if yes, it rips up the existing tree and replaces it with an alternative one. This phase uses one of the algorithms described in Section 3. Phase 1 terminates when four subsequent iterations yield the same cost and the number of overows, indicating that further reduction in the number overows is unlikely. Next, we check whether the application of the network coding technique can further reduce the number of overows. This phase is referred to as the Phase 2. We rst identify pairs of nets that have overowed edges and share at least there terminals. We then apply Algorithm NC, presented in Section 4, to nd an alternative network coding topology and perform rip-up and reroute if such a topology is benecial in terms of reducing congestion and eliminating overows.

6
si 1 t1 1 sj si 1 t1 1 sj si 0 t1 0 sj si 0 t1

VLSI Design
0 sj

t2

1 (a)

t3

t2

1 (b) si

1 0

t3

t2

1 (c) 0

t3

t2

0 (d)

t3

t1

sj

t2

0 (e)

t3

Figure 2: Steps for nding network coding topology using Algorithm NC. (a) A graph G with two nets ni and n j that connect nodes si and s j to terminals T1 = T2 = T12 = {t1 , t2 , t3 }. (b) A Steiner tree connecting s j to T2 . (c) Modied graph in which the costs of all edges found in are set equal to zero and the edges connecting si to t1 are reversed. A shortest path from s j to t1 is shown. (d) Modied graph in which the edges connecting si to t2 are reversed. A shortest path from s j to t2 is shown. (e) Modied graph in which the edges connecting si to t3 are reversed. A shortest path from s j to t3 is shown.
120 Average total overows 100 80 60 40 20 0 stTree1 stTree2 stTree3 stTree4 stTree5 Algorithm used in Phase 1 Average maximum overow 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 stTree1 stTree2 stTree3 stTree4 Algorithm used in Phase 1 stTree5

Phase 1 Phase 2 (a) Average wirelength (in thousands)


103

Phase 1 Phase 2 (b) 10 9 8 7 6 5 4 3 2 1 0 stTree1 stTree2 stTree3 stTree4 Algorithm used in Phase 1 stTree5

330 325 320 315 310 305 300 295 290 stTree1 stTree2 stTree3 stTree4 stTree5 Algorithm used in Phase 1 Cpu (s) Phase 1 Phase 2 (c)

(d)

Figure 3: Evaluation of ve dierent algorithms for Phase 1 and Algorithm NC for Phase 2 on ISPD98 benchmark les. In all experiments, the polynomial cost function is used. (a) Comparison of the average total overow. (b) Comparison of the average maximum overow. (c) Comparison of the average total wirelength. (d) Comparison of the average running time. Results present average over 10 ISPD98 benchmark les.

VLSI Design

Table 1: Performance evaluation using Algorithm stTree3 for Phase 1 and Algorithm NC for Phase 2 on ISPD98 benchmarks with polynomial cost functions. Phase 1 ovmax 1 0 0 2 0 0 0 0 0 0 Phase 2 ovmax 1 0 0 2 0 0 0 0 0 0

ibm1 ibm2 ibm3 ibm4 ibm5 ibm6 ibm7 ibm8 ibm9 ibm10

ovtot 14 0 0 125 0 0 0 0 0 0

wlen 81058 216299 188391 196106 415681 336105 466381 477404 508661 711201

ovtot 6 0 0 93 0 0 0 0 0 0

wlen 80727 216299 188391 195949 415681 336105 466381 477404 508661 711201

Table 2: Performance evaluation of dierent cost functions (using Algorithm stTree3 for Phase 1 and Algorithm NC for Phase 2) on ISPD98 benchmarks. ovtot 0 0 0 31 0 0 0 0 0 0 Polynomial cost ovmax 0 0 0 1 0 0 0 0 0 0 ovtot 117 79 0 834 0 54 82 37 15 95 Exponential cost ovmax 1 2 0 7 0 2 1 1 1 3 ovtot 0 0 0 439 0 0 0 0 0 0 History-based cost ovmax 0 0 0 4 0 0 0 0 0 0

ibm1 ibm2 ibm3 ibm4 ibm5 ibm6 ibm7 ibm8 ibm9 ibm10

Table 3: Performance evaluation using Algorithm stTree3 for Phase 1 and Algorithm NC for Phase 2 on modied ISPD98 benchmarks (decreased both horizontal and vertical capacity by 1 unit) with polynomial cost function. Flute ovmax 14 22 16 15 4 27 15 13 14 20 Phase 1 ovmax 4 1 1 6 0 1 1 1 1 1 Phase 2 ovmax 3 1 0 6 0 1 0 0 1 1 Number of XOR gates wlen 77231 217737 183391 193131 473265 334163 473743 476864 496409 713553 5 12 2 52 70 0 2 0 24 18

ibm1 ibm2 ibm3 ibm4 ibm5 ibm6 ibm7 ibm8 ibm9 ibm10

ovtot 3257 5678 2308 4833 5 5492 4665 5400 8545 7103

wlen 60209 166193 145777 162844 410038 276012 363678 403502 411524 574743

ovtot 362 18 1 550 0 3 1 3 16 4

wlen 78161 217673 183395 193832 476251 334165 473743 476880 496563 713569

ovtot 184 4 0 440 0 1 0 0 9 1

Table 4: Improvement over MaizeRouter for modied (congested) IBM benchmarks using Algorithm stTree3 for Phase 1 and Algorithm NC for Phase 2, using polynomial cost function. Ver. Cap ibm1 ibm5 ibm8 ibm9 11 21 17 10 Hor. Cap 13 31 28 24 ovtot 43 45326 649 4006 MaizeRouter ovmax 1 25 9 9 ovtot 34 45151 641 3989 Phase 1 and Phase 2 ovmax 1 25 9 9

VLSI Design

(57,58) (58,58) (59,58) (60,58) (61,58) (62,58)

(57,58) (58,58) (59,58) (60,58) (61,58) (62,58)

(57,57) (58,57) (59,57) (60,57) (61,57) (62,57)

(57,57) (58,57) (59,57) (60,57) (61,57) (62,57)

(57,56) (58,56) (59,56) (60,56) (61,56) (62,56)

(57,56) (58,56) (59,56) (60,56) (61,56) (62,56)

(57,55) (58,55) (59,55) (60,55) (61,55) (62,55)

(57,55) (58,55) (59,55) (60,55) (61,55) (62,55)

(57,54) (58,54) (59,54) (60,54) (61,54) (62,54)

(57,54) (58,54) (59,54) (60,54) (61,54) (62,54)

(57,53) (58,53) (59,53) (60,53) (61,53) (62,53)

(57,53) (58,53) (59,53) (60,53) (61,53) (62,53)

a b a+b

si sj Ti j (a)

a b a+b

si sj Ti j (b)

Figure 4: A network coding topology for benchmark ibm1 for pair of nets ni = 66 and n j = 1531 computed using Algorithm NC. There are two nets ni and n j with sources si = (60, 55), and s j = (60, 57) and set of common terminals Ti j = (59, 55), (60, 54), and(57, 55). Part(a) shows routing layout without network coding. Part(b) shows routing layout with network coding. Example shows that network coding has helped to reduce congestion on edges (60, 54), (60, 53), (59, 53), and (58, 53).

The experimental results are shown in Figures 3(a) 3(c). The gures present average performance over all ten benchmarks. The cost function for this set of experiments was set according to (6) with 10. We have observed that larger values of yield fewer overows but result in larger running times and increased wirelength. We also note that for Phase 1, Algorithm stTree3 shows the best performance in terms of reducing the total number of overows as well as reducing the maximum overow. In fact, Algorithm stTree3 eliminates all overows in all benchmarks, except for ibm4 as given in Table 1. We also note that Algorithms stTree4 and stTree5 yield Steiner trees with a smaller total wirelength. This is due to the fact the intelligent search algorithms favor paths that have small hop count. We observe that the network coding technique results in a considerable reduction of the total number of overows as well as reduces the maximum overow. Furthermore, for each pair of nets for which we perform network coding, the number of required gates is small. Moreover, in all cases that we have encountered, the network coding operations can be performed over nite eld GF(2), that is, each encoding node can be implemented with a single XOR gate. Such gates incur minimum overhead, because they can serve as buers for long wires. An example of how network coding can help in reducing the wirelength of two nets in the ibm1 benchmark is shown in Figure 4. Figure 3(d) compares the running times of the dierent Algorithms for Phase 1 and Algorithm NC for Phase 2. As

expected, Algorithm stTree4 is one of the fastest algorithms, whereas Algorithm stTree1 and stTree5 have running times comparable to Algorithm stTree2. Moreover, Algorithms stTree4 and stTree5 are faster than their counterparts (Algorithms stTree1 and stTree2, resp.). This is due to the fact that intelligent search methods speed up the search by preferring nodes closer to the destination. In the second set of experiments, we evaluated the performance of three cost functions mentioned in Section 2.2 on the ISPD98 benchmarks using Algorithm stTree3 for Phase 1 and Algorithm NC for Phase 2. For cost function given by (6), we used 10, whereas for cost function given by (7), we used 50, and for cost function given by (8), (e) was a polynomial cost function with 10. The results are shown in Table 2. Polynomial cost function showed the best performance in terms of overows. In another set of experiments, we worked on slightly modied ISPD98 benchmark les. The modication included reducing the vertical and horizontal capacity by one unit. For Phase 1, we used Algorithm stTree3 and then applied Algorithm NC in Phase 2. Polynomial cost function given by (6) was used to check the performance on these more congested cases. The results are given in Table 3. We have also conducted the same experiment on several selected benchmarks on the output of MaizeRouter [13]. In these experiments, we iteratively reduced the vertical and horizontal capacity of the ISPD98 benchmarks until we got overows while running them through MaizeRouter. Then,

VLSI Design we used the output of MaizeRouter as input to Phase 1 using Algorithm stTree3, and after that, we have applied Algorithm NC in Phase 2. The cost function used was polynomial with = 10. The results are shown in Table 4. The results demonstrate that Algorithms stTree3 combined with Algorithm NC perform well and can contribute to further reduction of the number of overows.

9
global router, in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD 07), pp. 503 508, November 2007. M. Pan and C. Chu, FastRoute 2.0: a high-quality and ecient global router, in Proceedings of Asia and South Pacic Design Automation Conference (ASP-DAC 07), pp. 250255, January 2007. J. A. Roy and I. L. Markov, High-performance routing at the nanometer scale, in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD 07), pp. 496 502, November 2007. M. D. Mott, MaizeRouter: engineering an eective Global Router, in Proceedings of Asia and South Pacic Design Automation Conference (ASP-DAC 08), pp. 226231, March 2008. M. R. Garey and D. S. Johnson, Computers and Intractability, Freeman, San Francisco, Calif, USA, 1979. G. Robins and A. Zelikovsky, Improved Steiner tree approximation in graphs, in Proceedings of the 11th Annual ACMSIAM Symposium on Discrete Algorithms, pp. 770779, January 2000. S. Vo, Steiners problem in graphs: heuristic methods, Discrete Applied Mathematics, vol. 40, no. 1, pp. 4572, 1992. M. P. D. Arago and R. F. Werneck, On the implementation a of MST-based heuristics for the steiner problem in graphs, in Proceedings of the 4th International Workshop on Algorithm Engineering and Experiments, pp. 115, 2002. R. Dechter and J. Pearl, Generalized best-rst search strategies and the optimality of A , Journal of the ACM, vol. 32, no. 3, pp. 505536, 1985. S. Y. R. Li, R. W. Yeung, and N. Cai, Linear network coding, IEEE Transactions on Information Theory, vol. 49, no. 2, pp. 371381, 2003. R. Koetter and M. M dard, An algebraic approach to network e coding, IEEE/ACM Transactions on Networking, vol. 11, no. 5, pp. 782795, 2003. T. Ho, R. Koetter, M. M dard, D. R. Karger, and M. Eros, e The benets of coding over routing in a randomized setting, in Proceedings of IEEE International Symposium on Information Theory (ISIT 03), p. 442, July 2003. S. Jaggi, P. Sanders, P. A. Chou et al., Polynomial time algorithms for multicast network code construction, IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 1973 1982, 2005. N. Jayakumar, S. P. Khatri, K. Gulati, and A. Sprintson, Network coding for routability improvement in VLSI, in Proceedings of the International Conference on Computer-Aided Design (ICCAD 06), pp. 820823, November 2006. K. Gulati and S. R. Khatri, Improving FPGA routability using network coding, in Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI 08), pp. 147150, March 2008. C. J. Alpert, ISPD98 circuit benchmark suite, in Proceedings of the International Symposium on Physical Design (ISPD 98), pp. 8085, April 1998. C. Chu, FLUTE: fast lookup table based wirelength estimation technique, in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD 04), pp. 696 701, November 2004. M. A. R. Chaudhry, Z. Asad, A. Sprintson, and J. Hu, Ecient rerouting algorithms for congestion mitigation, in Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI 09), pp. 4348, May 2009.

[11]

[12]

6. Conclusions
In this paper, we presented several ecient techniques for rip-up-and-reroute stage of the global routing process. Our techniques are based on ripping up nets that go through highly congested areas and rerouting them using ecient Steiner tree algorithms. We have considered several ecient Steiner tree algorithms as well as several cost functions that take into account congestion and overow. We have also studied the application of the novel technique of network coding and showed that it can eciently handle dicult routing cases, facilitating reduction in the number of overows.
[13]

[14] [15]

[16] [17]

Acknowledgment
A preliminary version of this paper appeared in the proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) [27]. This work was supported in part by NSF grant CNS-0954153.

[18]

References
[1] T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, John Wiley & Sons, New York, NY, USA, 1990. [2] N. A. Sherwani, Algorithms for VLSI Physical Design Automation, Kluwer Academic Publishers, Norwell, Mass, USA, 2nd edition, 1995. [3] R. Ahlswede, N. Cai, S. Y. R. Li, and R. W. Yeung, Network information ow, IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 12041216, 2000. [4] J. Hu and S. S. Sapatnekar, A survey on multi-net global routing for integrated circuits, Integration, the VLSI Journal, vol. 31, no. 1, pp. 149, 2001. [5] R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh, Predictable routing, in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD 00), pp. 110113, 2000. [6] C. Albrecht, Provably good global routing by a new approximation algorithm for multicommodity ow, in Proceedings of the International Symposium on Physical Design (ISPD 00), pp. 1925, April 2000. [7] J. Cong, J. Fang, and Y. Zhang, Multilevel approach to fullchip gridless routing, in Proceedings of the International Conference on Computer-Aided Design, pp. 396403, November 2001. [8] Y. W. Chang and S. P. Lin, MR: A new framework for multilevel full-chip routing, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 23, no. 5, pp. 793800, 2004. [9] M. Cho and D. Z. Pan, BoxRouter: a new global router based on box expansion and progressive ILP, in Proceedings of the ACM/IEEE Design Automation Conference, pp. 373378. [10] M. Cho, K. Lu, K. Yuan, and D. Z. Pan, BoxRouter 2.0: architecture and implementation of a hybrid and robust

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

You might also like