Academia.eduAcademia.edu

Facet: A procedure for the automated synthesis of digital systems

1983

In the past decade significant effort has been devoted to the development of methodologies for design at the registertransfer level. However, effective and versatile procedures are still not available. This paper presents an efficient procedure for the automated synthesis of data paths at the registertransfer level. The procedure minimizes the numbers of storage elements, data operators, and interconnection units. In addition, the procedure has the capability of exploring alternatives in the design space.

F a c e t : A P r o c e d u r e for t h e A u t o m a t e d S y n t h e s i s of Digital S y s t e m s 1 Chia-Jeng Tseng and Daniel P. S i e w i o r e k Departmer~ts of Electrical Engineering and C o m p u t e r Science Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 2 Specification of Initial Code Abstract Sequences In the past decade significant effort has been devoted to the development of methodologies for design at the registertransfer level. However, effective and versatile procedures are still not available. This paper presents an efficient procedure for the automated synthesis of data paths at the registertransfer level. The procedure minimizes the numbers of storage elements, data operators, and interconnection units. In addition, the procedure has the capability of exploring alternatives in the design space. In some preliminary experiments the procedure produced designs nearly identical to commercially produced designs. 1 Introduction The Carnegie-Mellon University Design Automation (CMUDA) system has been developed over the past six years [6]. Using the ISPS description [3] as input, the CMU-DA system proceeds through global optimization, design style selection, data-memory allocation, physical module binding, control allocation, chip partitioning, and mask generation phases. This paper describes the result of some research in the datamemory allocation phase. The problem of data-memory allocation includes five subpro~)tems. They are the specification of data flow and control flow, the allocation of storage elements, the allocation of data operators~ the allocation of interconnection units, and the exploration of the design space. The issues of specifying the initial operation sequences are described in Section 2. Given a list of operation sequences (in some sense, this means the performance is specified), the problem of design improvement is concerned with the minimization of the numbers of storage elements, data operators, and interconnection units. These three minimization problems can be formulated into the clique.partitioning problem. The cliquepartitioning pro~lem will be detailed in Section 3. Section 4, 5, and 6 describe the formulations and present algorithms for generating solutions for each of these three problems. Exploring the design space is the topic of Section 7. Section 8 contains conclusions and suggestions for future work. 1This research has been supported by the National Science Foundation under Grant ENG 78-25755. T h e p r o c e d u r e is mainly c o n c e r n e d with the search of cliques in graphs. T h e similiarity in shape f o r a facet and a clique stimulated us to use " F a c e t " as the name of the procedure. The input to the data-memory allocator is the value trace (VT) [7, 12]. A VT preserves all the data flow and control flow information in the original ISPS description. This section presents a procedure for specifying the initial code sequences. A basic block is a linear sequence of operation codes having one entry point (the first operation executed) and one exit point (the last operation executed)[1]. A VT basic block is first converted into a two-dimensional list of operation sequences. To preserve the maximum parallelism in a VT, a specific name is assigned to each value in the VT. Taking advantage of the data dependency relationships among different operations, the operation sequences are compacted in an as early as possible (AEAP) manner. The AEAP strategy (or the First-come Firstserve strategy) has been applied in microcode compaction and proved to generate near-optimal solutions [5]. Using the AEAP strategy, an operation is moved forward to the horizontal list just behind the horizontal list where (at least) one of its operands is defined. Starting from the entry point of a basic block, the statements are processed one by one. Each time a triple statement (a statement which consists of one operation, one or two sources, and one destination) is processed, its operands are compared with the names defined in the previous line. If one of its operands is defined in the previous line, the operation is located there. Once the location of a statement is specified, its destination name is compared with the destination names of the other statements in the horizontal list. If some statement has the same destination name, then the original one is a redundant statement. The redundant statement can be eliminated. Table 1 is a VT-like data flow specification which will be used as a running example throughout the paper. Table 2 is the code sequence obtained by using the AEAP strategy. The statement enclosed by parentheses is a redundant statement and should be eliminated. V3 V5 V7 V8 V9 Vll V12 V13 V12 V14 V15 V1 V2 = = = = = = = = = = = = = V1 + VZ V3 - V4 V3 * V6 V3 + V5 V1 + V7" V I O / V5 100 V3 Vl V l l and V8 V12 o r V9 V14 V15 Table 1 : A VT-like Data Flow Specification 20th Design A u t o m a t i o n Conference Paper 31.4 490 0 7 3 8 - 1 0 0 X / 8 3 / 0 0 0 0 / 0 4 9 0 5 1 . 0 0 © 1983 IEEE V3 V5 V8 V14 V1 = = = = = V l + VZ ; V3 - V 4 ; V3 + V 5 ; Vll and V8 V14 ; ; (VlZ V7 Vg V15 V2 = = = = = 100) ; V3 * V 6 ; V1 + V7 ; V 1 2 o r V9 V15 VIZ V13 Vll = V1 = V3 = VIO Algorithm 1: / V5 Table 2: Compacted Code Sequences 3 Some P r o c e d u r e s for Partitioning a Graph into Disjoint Cliques Let G be a graph consisting of a finite number of nodes and a set of undirected edges connecting pairs of nodes. A nonempty collection C of nodes of G forms a complete graph if each node in (3 is connected to every other node of C. A complete graph C is said to be a c l i q u e [11] with respect to G if C is not contained in any other complete graph contained in G. The clique-partitioning problem is to partition the nodes in G into a number of disjoint clusters such that each node appears in one and only one cluster. Furthermore, each of these clusters itself forms a complete graph (clique). Many applications require the partitioning of a graph into the minimum number of disjoint cliques. Minimization is consistent with finding the cliques in the graph one by one. However, the search for cliques in a graph has been proved to be NP-comp/ete. Related research can be found in [2, 4, 8, 9, 10, 15]. A procedure which partitions a graph into a near minimum number of cliques is given in this section. The procedure uses the neighborhood proloerty (as described in the next subsection) among nodes to partition a graph into a set of disjoint cliques. The time complexity of the procedure is a polynomial function of the numbers of nodes and edges in the graph [14]. The procedure has been applied to several graphs and found to generate optimal partitionings. However, the neighborhood" property is not a sufficient condition for finding the clique in a graph and sometimes a suboptimal solution is generated. 1. Scan through the list of edges (EdgeList). For each i<j, compute its number of common neighbors. (i,j), 2. Pick the edge (p,q) which has the maximum number of common neighbors. Combine the lists of nodes headed by p and q. The smaller one of p and q is used as the head of the resulting clique. 2 Update the list of edges of G (as described in Algorithm 2). If the list of edges is empty, the graph partitioning is completed. 3. Assume p is the head of the resulting clique. Pick an edge which joins node p and other nodes and has the maximum number of common neighbors. Let the edge be (p,r), or (r,p) if r is smaller than p. Save r, or p if r is smaller than p. Update the list of edges of G (as described in Algorithm 2). If node p (or r if r is smaller than p) no longer appears in the EdgeList, go to Step 2 and start to collect the next cluster. Otherwise, repeat Step 3. In Step 2 and Step 3, if there is more than one pair having the maximum number of common neighbors, the numbers of edges which would be excluded are computed. The pair which excludes the least number of edges is selected. If more than one pair excludes the same number of edges, we choose one arbitrarily. The number of common neighbors for each pair of connected nodes can be calculated by inspecting the list of edges. How is the number of edges to be excluded computed if a pair of nodes is grouped together?. A node k which is connected to only one of i and j is no longer connected to the composite node (i,j). Thus the edge (i,k) or O,k) must be deleted. A node k which is connected to both i and j is still connected to the composite node (i.j). Only one of edges (i,k) and (j,k) needs to be deleted. For consistency, each time one of these two edges needs to be deleted, the edge (j,k) is deleted. Therefore, the numbers of edges to be excluded can also be computed by inspecting the list of edges. Once a pair of nodes is picked, the edge list needs to be updated in the following ways. 3.1 The A l g o r i t h m Let G be an undirected graph and its nodes be indexed by integers. Assume nodes i and j are connected and i is smaller than j. The edge which joins nodes i and j is represented by the integer.pair (i,j). Each of these nodes is the neighbor of the other node. If a third node is connected to the other two nodes, it is said to be a common neighbor of the pair. Two data structures are used to represent the graph. NodeList is used to store the nodes in the graph. It is a two-dimensional data structure. Each horizontal list contains a number of nodes which form a clique. Initially, each node of the graph occupies a horizontal list. When several nodes are grouped into a clique, they are coalesced into the same horizontal list. EdgeList is used to store the edges in the graph. All the edges which have the same "left node" (the node with the smaller index) are linked in a horizontal list. The indices in a horizontal list are sorted in an increasing order. The "left nodes" of all the horizontal lists are vertically linked together. Again, they are sorted in an increasing order. Given a sorted list of edges of a graph, the following algorithm partitions the nodes of a graph into disjoint clusters. Each cluster forms a clique. A l g o r i t h m 2: 1. Delete those edges which need to be deleted. 2. Recompute the numbers of common neighbors and the numbers of edges to be deleted for those pairs of connected nodes which remain in the list of edges. 3.2 An Illustrative Example Let the graph depicted in Figure t (a) be given. The list of edges, the number of common neighbors and the edges to be deleted for each pair of connected nodes are depicted in Figure 1 (b). For example, the number of common neighbors and the number of edges to be excluded for (1,2) can be computed in the following way.. Node 3 is the only node wich is connected to both nodes 1 and 2. The number of common neighbors for (1,2) is thus one. If nodes 1 and 2 are grouped together, the edges (1,2), (2,3), and (2,4) need to be deleted. Therefore, the number of edges to be excluded is three. 2 U s i n g , h e s m a l l e r n o d e as t h e h e a d of t h e r e s u l t i n g c l i q u e is j u s t a m a t t e r of convenience, It d o e s n o t i n f l u e n c e t h e f i n a l r e s u l t , Paper 31.4 491 As indicated in Figure 1 (b), the pairs of nodes (2,3) and (3,4) have the maximum number of common neighbors and exclude the same number of edges if either pair is combined. Let nodes 2 and 3 be the first pair to be grouped together. Nodes 1 and 4 are connected to both nodes 2 and 3. They are connected to the composite node in the reduced graph. Node 5 is only connected to one of these two nodes, it is not connected to the composite node in the reduced graph. The list of edges of the reduced graph is depicted in Figure 1 (c). To reduce the graph in Figure 1 (c), the numbers of common neighbors of the edges which consist of the composite node are compared. Both the edges (1,2) and (2,4) have the same number of common neighbors. Choosing the edge (1,2), the number of edges to be excluded from the graph is less than choosing the edge (2,4). Therefore, the edge (1,2) is selected. The composite node which contains the nodes 1, 2, and 3 is no longer connected to other nodes. They belong to a cluster. Let (p,q), (p,r) [or (r,p) if r is smaller than p], and (q,r) [or (r,q) if r is smaller than q] be three edges in G, where p is smaller than q. As indicated in Algorithm 1, if p and q are combined, then the composite node is represented by the smaller node p. Furthermore, the edge (q,r) is deleted from G. Let (p,r), (q,r), and (p,q) belong to three different categories, which are represented by i, j, and k. Assume the profits of grouping a pair of nodes in categories i, j, and k are ordered in an increasing manner. In addition, these three edges have the following form of transitive property (named the generalized transitive property). If the nodes p and q are grouped together, then nodes p and r can be included in a new category/, where I is the lower case of L. The edges in category / have a profit measure better than edges in category i. The algorithm is given below. Algorithm 3: A generalized clique-partitioning algorithm. Repeatedly applying the procedure to the reduced graph, the nodes in the original graph are partitioned into six clusters. They are { 1, 2, 3}, {4, 5}, {6, 7}, { 8 } , { 9 }, and { 10}. 1 1. Scan through the list of edges for category k (Gk). For each (i,j), compute its number of common neighbors. 2. Pick the edge (p,q) which has the maximum number of common neighbors from Gk. 3. Instead of directly applying Algorithm 2 to G and Gk, update G and Gk in the following way. For each node r which is only connected to one of the nodes p and q in the graph G, the edge is deleted from G. If the edge is also an edge of category k, it is deleted from Gk. For each node r which is connected to both p and q in G, the edge (q,r) is deleted from G. If this edge is contained in Gk, it is also deleted. Assume that (p,r) is an edge of category i and (q,r) is in the list of edges for category j. Due to the combination or grouping of p and q, the edge (p,r) becomes an edge of category/. The category identifier of (p,r) is changed to /. If the profit of combining pairs of nodes in category / is the same as or better than that of combining pairs of nodes in category k, the edge is included in Gk. Meanwhile, the number of common neighbors and the number of edges to be excluded for each of the edges remaining in Gk are updated. ~/CommonNeighbors i 6 \, 3 / - / 8 5 = 9 • 10 (1,2) 1 -3 (1,3) 1 -3 (2,3) 2 -3 (2,4) 1 -4 (3,4) 2 -3 (3,5) 1 -3 (4,5) 1 -3 (6,7) 0 -3 ~ (7,8) 0-3 (1,2) 0 (2,4) 0 (4,5) 0 (6,7) 0 (7,8) 0 (7,9) 0 -2 -3 -2 -3 -3 -3 ~' EdgesDeleted (7,9) 0 -3 (a) (b) (c) Figu re 1 : Graph Used by the Example 3.3 Modifying Algorithm 1 to Meet Demands of the Real World Naive application of the clique-partitioning algorithm to the data-memory allocation problem does not generate good solutions. In this subsection two other notions are introduced to direct the application of the clique-partitioning procedure. One is divide and conquer. The other is the transitive property. The interpretation of these two notions on data-memory allocation will be detailed in Sections 4, 5, and 6. Given a graph G, each edge of G represents some kind of relationship between the two nodes. When Algorithm 1 is applied, it is quite possible that several pairs of nodes have the same number of common neighbors and exclude the same number of edges if any pair is combined. It is also possible that the profit of grouping some set of nodes overrides the profit of grouping other sets of nodes. Assume that the edges of the graph can be classified into several categories according to the profit measure of grouping each pair of connected nodes. Then a subgraph can be constructed from those edges which belong to the same category. The modifying clique-partitioning algorithm uses these subgraphs to direct the task of clique. partitioning and avoid grouping pairs of nodes randomly. Paper 31.4 492 Tile subgraph in which pairs of nodes have the best profit measure is reduced first. Then the pairs of nodes having the next level of profit measure are collected and reduced. Repeatedly applying the procedure to the other subgraphs, the process is stopped when a subgraph of a specified category or the original graph G becomes empty. The transitive properties defined in Sections 4, 5, and 6 assume that the category identifiers j, k, and I refer to the same category. This is actually a special case of the generalized transitive property. It is named the loose form transitive property. Illustrative examples for the modified clique-partitioning algorithm are provided in Sections 4, 5, and 6. 4 Allocation of S t o r a g e Elements As indicated in [14], it is generally beneficial to assign more than one variable to the same physical location. This section discusses the issues of minimizing the number of storage elements. 4.4 C o n s t r u c t i o n of the L i f e t i m e C o m p a t i b l e Graph 4 . 1 Sufficient Conditions for Combining Two Variables Let the live/dead status of all the variab!es be represented by a lifetime list. The compatible graph is the graph consisting of ail the ~dges which join combinable pairs of nodes. To construct a compatible graph, a complete graph which consists of all the nodes is first created. The lifetime list and the list of code sequences are then traced and inspected. Unless the conditions given in Subsection 4.1 are satisfied, the edges which join those variables which are live in the same time interval are deleted from the graph. If an edge has already been deleted, this step is ignored. Those edges which associate with pure data transfers in some time intervals are marked. Given a set of variables, the problem is to combine those variables which can share a storage element. What are the sufficient conditions for combining two variables? A variable is five between the time of its definition and last use. A variable is dead between the time of its last use and the next definition. If the live periods of two variables are not overlapped, they have disjoint lifetimes. Obviously, two variables can be combined if they have disjoint lifetimes. In reality this constraint can be relaxed. Two variables A and B can be combined if their lifetimes are overlapped in such a way that one of them is used as a source and the other is used as the destination or vice versa in the same statement. In addition, the variable which is used as the source is dead in the next time interval, i.e., the use is a "last use." Pure data transfers are special cases. 4.5 Grouping Registers into Scratch Pad M e m o r i e s Having assigned all the variables to suitable physical locations, the next step is to investigate the possibility of grouping several registers into sets of scratch pad memories. Those variables which have disjoint access time can be grouped together. This problem can also be formulated into the clique-partitioning problem. 4.2 A Procedure for Compacting Variables If there are n variables and each pair of variables are proved to be combinable (there are n(n-1)/2 different pairs), then these n variables can be assigned to the same physical location. Let the nodes of a graph be the variables and each pair of nodes which can be cdmbined be joined by an edge. Then a graph which contains the lifetime relationships among all the variables can be constructed. Since the goal is to assign these variables to the minimum number of physical locations, this is actually the clique-partitioning problem. 4.6 An Example Let the compacted code sequences in Table 2 be given. Assume that the program is itself a loop. Having executed the statements in the.last line, the control flow is passed back to the statements in the first line. Applying the lifetime analysis algorithm to the example, the status of these variables in each time interval is indicated in Table 3. The combination of each pair of variables which are related by pure data transfers would cause these operations to be eliminated. This improvement reduces the number of control functions. If a horizontal list in the code sequence is occupied by pure data transfers, it further results in a faster implementation. To take this property into account, the reduction of the original graph is separated into two phases. In the first phase the edges which are associated with pure data transfers in some time intervals are collected to form a subgraph. Let the original graph and the subgraph be represented by G and G1 respectively. The edges in G and G1 satisfy the loose form transitive property. Algorithm 3 in Subsection 3.3 can be applied. Having completed the partitioning of the subgraph, Algorithm 1 is then applied to the remaining edges of the original graph. Once the variables have been compacted, the list of operation sequences is updated. The names which are grouped together are assigned to the same name. Operations of moving the content of a variable to itself are deleted. It is then possible that the code sequences can be further compacted. Therefore, the AEAP compaction is repeated once to make the final refinement for the operation sequences. 4.3 Lifetime Analysis According to the previous discussion, it is concluded that the lifetime analysis is an essential process for the minimization of the number of storage elements. The problem of lifetime analysis is well understood in the area of compiler design. Details can be referred to [1]. Having derived the live/dead history of each variable, the compatible graph can be constructed. As mentioned before, a complete graph is first constructed. Using Table 3, the conflict graph can be constructed in the following way. Considering the time interval defined as "1 ", the variables V1, V2, V3, V4, V6, VI0, and V12 are live. Each edge formed by these live variables must be deleted from the graph. The nodes V2 and V3 are used as a source and destination in the same statement. In addition, the source variable is dead in the next time interval. Therefore, the edge (2,3) is not deleted. Repeatedly applying the procedure to the entire lifetime, the resulting compatible graph is given in Table 4. Time Vl V2 V3 V4 V5 V6 ~ Entry L L D L D L D D D L D D D D D 1 L L L L D L D D D L D L D D D 2 L D L L L L L D D L D L D D D 3 L D L L L L L L L L L L D D O 4 D D L L L L D L L 5 L L D L D L D D D L D D D L L E xit L L D L D L D D D L D D D D O D L D L D V8 V9 ;VIO Vll V12 V13 V14 VlS L Table 3: Result of Lifetime Analysis Paper 31.4 493 (1,9) (2,8) (3,13)* (5.13) (7,14) (10,13) (13,15) (1,13) (2,9) (3,14) (5,14) (7,15) (11,13) (1,14)(2,11) (3,15) (5,15) (8,13) (11,14) (2,3) (2,13) (4,13) (6,13) (8,14) (12,13) (2,5) (2,15)* (5,8) (7,9) (9,13) (12,15) (z,7) (3,8) (5,11) (7,13) (9,15) (13,14) 1. G8: The operations and the three pairs of variables are all the same. 2. GT: The operations are different but the three pairs of variables are the same. Table 4: The Compatible Variable-pairs 3. G6: The operations and two pairs of variables are the same. The third pair of variables is different. 4. G5: Two pairs of the variables are the same. The operations and one pair of variables are different. Among these combinable variable-pairs, those edges which are accompanied by " * " are associated with pure data transfers in some time intervals. They are used to construct the second graph. Algorithm 3 is used to reduce these two graphs. It results in the following composite nodes: { 1, 14 }, { 2, 15 }, and {3, 13}. 5. G4: The operations and one pair of variables are the same. The other two pairs of variables are different. 6. G3: One pair of the variables is the same. The operations and the other two pairs of variables are different. 7. G2: The operations are the same. All three pairs of variables are different. Applying Algorithm 1 to the reduced graph, the variables are finally partitioned into eight clusters. They are { 1, 14 }, {2,7,9,15},{3,8,13},{4},{5,11},{6},{10}and{12}. The variables in each of these clusters can be assigned to the same physical location. The code sequences in Table 2 can be refined into the form in Table 5. 8. G I : The operations and all three pairs of variables are different. V3 V5 V3 V1 = = = = Vl V3 V3 V5 + V2 ; - V4 ; + V5 ; and V3 ; V12 V2 V2 V2 = = = = Vl V3 * V6 V1 + V2 ; V12 o r V2 V5 = V10 / V5 Table 5: Improved Code Sequences All of these subgraphs satisfy the generalized transitive property. For simplicity, the loose form transitive property is assumed for them. The algorithm for allocating data operators can be described as follows: 1. Create a complete graph whose nodes are indices of all the data operators. Trace through the code sequences and delete those edges connecting nodes which are used simultaneously. Identify the category of each edge. 2. Collect edges of Category 8. Use Algorithm 3 to reduce G and G8. 5 Allocation of Data Operators The allocation of data operators consists of two tasks. One is the combination of the same kind of operators. T h e other is the grouping of various kinds of operators into arithmetic and logic units. The goal is to assign these data operations to the minimum number of clusters. The problem is again formulated into the clique.partitioning problem. 5.1 The F o r m u l a t i o n A data operator is called an isolated opgrator if it is exclusively assigned to a triple statement. What is the effect of grouping two isolated data operators into one unit. If these two operations are the same, the number of operators is reduced by one. In addition, depending on the corresponding source operands and the destination variables are the same or not, the numbers of multiplexers and wired.broadcast trees may be increased or decreased. What is the effect of merging an isolated operator into an ALU or combining two ALU's? First, the buses and the gating elements connected to the input and output ports of the ALU can be shared. If the operations have common sources or destination, the original gating elements for the input or output ports of these modules can also be shared. Therefore, it is generally beneficial to merge an isolated data operator into an ALU or to-combine two ALU's [14]. An important issue for the allocation of data operators is choosing an appropriate set of operators to group together. Let two isolated operations be given. Inspecting the relationship between these two operations, there are sixteen cases [14]. These sixteen cases can be classified into eight categories. They are listed below. Paper 31.4 494 3. Having reduced the subgraph of Category 8, the graph G together with the subgraphs of Categories 7, 6, 5, 4, 3, 2, and 1 are reduced one by one. 5.2 An Example Let each operation in Table'5 be assigned to a specific name. An assignment is given in Table 6. Using the code sequence and operator assignment, the compatible graph G is depicted in Table 7. The superscript integer at the right side of each edge is the category identifier of the edge. V3 = V l +1 V2 ; V12 = V1 V5 : V3 -1 V4 ; V2 = V3 *1 V6 V3 = V3 +2 V5 ; V2 : V t +3 V2 ; V1 = V5 and I V3 ; V2 = V12 o r 1 V2 V5 : V l 0 / 1 V5 Operator I d e n t i f i e r s : Indices: +1 -1 "1 +2 +3 /1 and1 °rl 1 2 3 4 5 6 7 8 T a b l e 6: Assigning Operator Identifiers (1,2) 1 (1,3) 1 (1,4) 4 -(1,5) 6 (1,6) 1 (1,7) 1 (1,8) 3 (2,4) 3 (2,5) 1 (2,6) 1 (2.7) 3 (2,8) 1 (3,4) 3 (3,5) 3 (3,6) 1 (3,7) 3 (3,8) 3 (4.7) 3 (4,8) 1 (5,7) 1 (5,8) 5 (6,7) 3 (6,8) 1 Table 7: G: The Edges of the Original Compatible Graph The edge of Category 6 in G is retrieved to form the subgraph G6. G6 only consists of one edge. It is (1,5). The original graph G and the subgraph G6 are first reduced. The nodes 1 and 5 are combined. In the reduced graph the categories of the edges (1,3) and (1,8) are updated to 3 and 5 respectively. The reduction procedure is continued until the list of edges becomes empty. The data operators are finally grouped into three clusters. They are { 1,3, 5, 8 }, { 2, 4, 7 }, and { 6 }. Refining the Inilial A l l o c a t i o n An initial design might have a "join-node" in which more than one bus is connected to a single input port. It is necessary to insert a multiplexer in front of the input port. The "join-node" caP. easily be found by checking the data paths connected to an input port. If an input port is connected to more than one bus, ~hen the node needs to be refined. 6.3 An Example 6 A l l o c a t i o n of I n t e r c o n n e c t i o n Units This section discusses the issues in the allocation of interconnection units. 6.1 A l i g n m e n t of O p e r a n d s An operation may be either Commutative or noncommutative. For those commutative operations, the designer has the freedom of flipping the position of the two operands. If the operands of all the operations are suitably aligned, the number of interconnection units can be decreased. Let the operands of unary and noncommutative operations be collected in two sets. The operands of th6 commutative operations can be suitably aligned by comparing the operands with variables in these two sets. The example is based on the code sequence in Table 6 and the ALU's allocated in Section 5. Inspecting the operands of the operations associated with ALU2, it is found that the positions of the two operands of the statement "V1 = V5 a n d 1 V3" need to be flipped. It is changed into the form of " V l = V3 a n d 1 VS". Using the indices in Table 8, the compatible graph in Table 9 (G) is constructed. In Table 9, those edges which join interconnection variables with the same source or the same sink are enclosed by square brackets. The graph formed by these edges is called G1. Source Destination Indexing Name Name Integer ........................................ V1 V1 V2 V3 V3 V4 V5 V5 V6 V10 V12 ALUI.0ut ALU1.Out ALU2.Out ALU2.0Ut ALU2.0ut ALU3.0ut 6.2 The Formulation Interconnection variables which are never used simultaneously can be grouped together to fGrm buses. The g0al is to group the interconnection variables into the minimum number of clusters. The problem is again formulated into the clique-partitioning problem. To obtain a good bus style design, it is essential to minimize both the number of buses and the total number of drivers and receivers[13]. There is no profit in combining two interconnection variables which originate from different sources and destine to different sinks. On the other hand, it is generally beneficial to group those interconnection variables which originate from the same source or destine to the same sink to share a common bus. The details of the formulation are given below. A complete graph in which nodes consist of all the interconnection variables is first constructed. Then the code sequence is traced through. In each time interval, if two interconnection variables are used simultaneously, the edge formed by these two nodes is deleted. Those interconnection variables which are associated with the same source, even when they are used concurrently, can still share a common interconnection. Therefore, when we construct the compatible graph, these entries are not deleted. Let the compatible graph be represented by G. When the compatible graph is constructed, if two interconnection variables originate from the same source or destine to the same sink, the edge which joins these two vaiables is marked. All the marked edges are collected to form the second graph (named G1). The loose form transitive property is applicable to G and G1. Algorithm 3 in Subsection 3.3 can be applied. V12 ALUt. I n l ALUI.In2 ALUI.Inl ALU2.Inl ALU2.In2 ALUZ.In2 ALD3.In2 ALUt. In2 ALU3.Int ALUI.Inl V2 V3 V1 V3 V5 V10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Table 8: Indices of interconnection variables [1,2] (1.9) (1,16) (2,14) [4,5] (4,14) (6,10) [7,8] (8.13) (9.14) (10,16) (13,14) (14,17) (1,4) (1,10) (1.17) (2,16) (4,7) (4,15) (6,11) (7,9) (8.14) (9,15) (11,13) [13,15] [15,16] (1,5) (1,11) [2,4] (3,4) (4,8) (4,17) (6,13) (7,13) (8,16) (9,17) (11,15) (13,16) (16,17) (1,5) (1,12) (2,6) (3,6) (4,10) (5,13) (0,14) (7,16) (9,10) (10,11) (11,16) (13,17) (t,7) (1,14) (2,9) [3,9] [4,11] [6,7] (6,15) (8,9) (9,11) (10,13) (11.17) [14,15] (1,8) (1,15) [2,11] (3,16) (4,13) (6,8) (6,17) (8,11) (9,13) (10,14) [12,13] [14,16] Table 9: G: List of edges which join combinable interconnection variables In G1 nodes 14 and 16 have the maximum number of common neighbors. The are combined. G and G1 are reduced. The next node to be selected should be node 15. Inspecting the nodes which are connected to node 15 and the composite node { 14, 16 }, it is found that both (13,14) and (13,15) are contained in G. However, among them, only (13,15) belongs to G I . Since the edge (13,15) is being deleted, the Paper 31.4 495 edge (13,14) should be added into G I . The nodes 13 and 14 are then combined to form the composite node { 13, 14, 15, 16 }. This composite node forms a cluster. Repeatedly applying the above algorithm to G and G1, the interconnection variables are finally partitioned into eight groups. They are { 1 3 , 1 4 , 1 5 , 1 6 } , { 1 , 2 , 4 , 1 1 } , { 6 , 7 , 8 } , { 3 , 9 } , { 5 } , { 1 0 } , { 1 2 } and { 1 7 } . Figure 2 depicts the completed allocation of the data-memory part. Acknowledgements Comments and suggestions by Drs. Marie R. Barbacci, Stephen W. Director, and Donald E. Thomas are gratefully acknowledged. References [1] A.V. Aho and J.D. Ullman, "Principles of Compiler Design," Addison-Wesley, Reading, MA, 1977. [2] J. G. Augustson and J. Minker, "An Analysis of Some Graph Theoretical Cluster Techniques," Journal of the ACM 17(4):571-588, October 1970. [ ,v,u^ --~"-~ I MUX _~..~ALU2 MUX - - ~ - ~ ALUJ3 Figure 2: Data Paths of the Example 7 Exploring the Design Space Design tradeoffs are fundamental to exploring a design space. Cost and speed are generally used to define the tradeoffs. If the serialization of some statements make the live periods of two variables satisfy the conditions given in Subsection 4.1, then these two variables can be assigned to the same register. Similiarly, two operators of the same kind, two ALU's, or two buses can be grouped together if all the parallelism associated with them has been eliminated. These tradeoffs are named the basic tradeoffs. Let a base design which is directly translated from a VT and improved by the algorithms presented in Sections 4, 5, and 6 be given. Assume that there are N basic tradeoffs. The effect of any one, any two, any three, etc. of these tradeoffs can be considered. The ultimate case is to consider the effect of all N tradeoffs. To limit the search space, only a limited number of composite tradeoffs are considered. By inspecting the improved code sequence and the allocated data paths in the previous sections, it is not difficult to find all the basic and composite tradeoffs for the example. We leave this as an exercise for the readers. 8 Conclusions and Future Work This paper presents a procedure for data-memory allocation. The procedure has been programmed and in some preliminary experiments has produced designs nearly identical to commercially produced designs. Further research will focus on more extensive experimentation. Paper 31.4 496 [3] M. R. Barbacci, G. E. Barnes, R.G. Cattell, and D. P. Siewiorek, "The Symbolic Manipulation of Computer Descriptions: The ISPS Computer Description Language," Technical Report, Department of Computer Science, Carnegie-Mellon University, March 1978. [4] C. Bron and J. Kerbosch, "Finding All Cliques of an Undirected Graph -- Algorithm 457", Communications of the ACM 16(9):575-577, September 1973. [5] S. Davidson, D. Landskov, B. D. Shriver, and P. W. Mallett, "Some Experiments in Local Microcode Compaction for Horizontal Machines," IEEE Transactions on Computers, C-30(7), July 1981. [6] S. W. Director, A.C. Parker, D. P. Siewiorek, and D. E. Thomas, "A Design Methodology and Computer Aids for Digital VLSl Systems," IEEE Transactions on Circuits and Systems, CAS-28(7), July 1981. [7] M. McFarland, "The Value Trace: A Data Base for Automated Digital Design," Master Thesis, Department of Electrical Engineering, Carnegie-Mellon University, December 1978. [8] J. W. Moon and L. Meser, "On Cliques in Graphs," Israel Journal of Mathematics (3):23-28, March 1965. [9] G.D. Mulligan and D.G. Corneil, "Corrections to Bierstone's Algorithm for Generating Cliques," Journal of the ACM 19(2):244-247, April 1972. [10] M. C. Paull and S. H. Unger, "Minimizing the Number of States in Incompletely Specified Sequential Switching Function3," IRE Transactions on Electronic Computers (EC-8):656-367, September 1959.[11] E.M. Reingold, J. Nievergelt, and N. Deo, "Combinatorial Algorithms: Theory and Practice," Prentice-Hall, 1977. [12] E. A. Snow, "Automation of Module Set Independent RegisterTransfer Design," Ph.D. Thesis, Department of Electrical Engineering, Carnegie-Mellon University, April 1978. [13] C. J. Tseng and D. P. Siewiorek, ".The Modeling and Synthesis of Bus Systems," Proceedings of the Eighteenth Design Automation Conference, pages 471-478, ACM SIGDA and IEEE Computer Society DATC, June 1981. [14] C.J. Tseng and D.P. Siewiorek, "A Note on the Automated Synthes~s of Bus Style Systems," Technical Report, Department of Electrical Engineering, October 1982. [15] S. Tsukiy~ma, M. Ide, H. Ariyoshi, and I. Shirakawa, "A New Algorithm for Generating All the Maximal Independent Sets," SIAM Journal of Computing 6(3):505-517, September 1977.