Academia.eduAcademia.edu

Timing Analysis in a Logic Synthesis Environment

A goal of a logic synthesis system is the automatic generation of area optimised designs that meet timing requirements. The desi n process involves repeated timing analyses followed % y appropriate modifications. We present fast new algorithms for system level timing analysis and for the generation of timing constraints to guide the redesign of portions of combinational logic. Our systematic approach correctly models designs that incorporate level sensitive latches controlled by multifrequency, as well as simple multi-phase, clocks. A new feature is that the minimum number of settling times are evaluated for the nodes of combinational networks with input transitions controlled by different clock signals. The computer program Hummingbird uses the algorithms presented. Hummingbird interfaces with other programs in the Berkeley Synthesis System through the OCT data base. For a digital signal processing chip, comprising 3681 standard cells, timing analysis is performed in 14.87 cpu seconds on a VAX 8800 running the ULTRIX operating system.

Timing Analysis In A Logic Synthesis Environment Nicholas Weiner and Albert0 Sangiovanni-Vincentelli Department of Electrical Engineering and Computer Sciences University of California, Berkeley, California 94720 Abstract A goal of a logic synthesis system is the automatic generation of area optimised designs that meet timing requirements. The desi n process involves repeated timing analyses followed %y appropriate modifications. We present fast new algorithms for system level timing analysis and for the generation of timing constraints to guide the re-design of portions of combinational logic. Our systematic approach correctly models designs that incorporate level sensitive latches controlled by multifrequency, as well as simple multi-phase, clocks. A new feature is that the minimum number of settling times are evaluated for the nodes of combinational networks with input transitions controlled by different clock signals. The computer program Hummingbird uses the algorithms presented. Hummingbird interfaces with other programs in the Berkeley Synthesis System through the OCT data base. For a digital signal processing chip, comprising 3681 standard cells, timing analysis is performed in 14.87 cpu seconds on a VAX 8800 running the ULTRIX operating system. 1 Introduction For logic synthesis systems, such as the Berkeley Synthesis System, designs are specified as high level descrip tions of combinational logic modules and of the interconnections between these modules and synchronising elements clocked memory elements and drivers). Comgenerated binationa 5 logic implementations are initially without re ard to timing requirements. A timing analyser is nee 3 ed to consider both the design and the clock waveforms, to determine where timing problems may arise. To guide logic re-synthesis [l], the timing analyser must also provide delay constraints for logic switching paths. Static CMOS standard cells provide one of the dominant design methodologies for automatic VLSI synthesis. We have oaid Darticular attention to la.rne networks of such cells,- spe&fically, to the necessity to correct1 model the behaviour of level sensitive (or “transparent” 3 Latches. From our experience with users of the Berkeley Synthesis System, we have identified the need to avoid assumptions concerning the clock waveforms, or the way in which the clocks are used to synchronise the system. Figure 1 shows a simple configuration in which inputs to a logic gate are updated at different times durin the clock period, In this example the output from the &ogic gate is required to settle to two different valid states during each clock cycle. The logic gate is therefore “time multiplexed within each overall clock period”. Permission to copy without fee all or pan of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM cnpyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy othetise, or to republish, requires a fee and/or specific permission. r logic H Figure 1: Logic with latches controlled by four different clock phases. We make a distinction between component propagation-delay estimation and system timing analysis. A component propagation-delay is the time difference ‘between a voltage transition at an input and the resulting transition at an output. System timing analysis makes use of component delay estimates to determine whether a composite system will behave uas intended”, and if not, what the timing problems are. Component delay estimation techniques may use approximate analytical or numerical methods [2,3,4]. For standard cells, empirical delay estimation formulae are often used. By separating component delay-estimation and system-timing analysis, different delay-estimation methods may be combined. In this paper we develop a systematic approach to system-level timing-analysis. Two algorithms are presented that make use of estimates of maximum component propagation delays. The first finds all paths that The second generates constraints to are %oo slow”. The algorithms allow any set guide logic re-synthesis. freof clock signals, with any (harmonically related quencies and phase relationships, used arbitrari 1’y for synchronisation. They also model transparent latches correctly. The assumptions that we make in Section 3 define the class of systems to which our analysis algorithms apply. 2 Related Work We briefly overview some of the related work in sys[5] presented tem timing analysis. In 1980 McWilliams a method m which portions of combinational logic are individually analysed, and timing violations at latch inputs are reported. The approach can handle complicated clocking schemes, but it can not model the behaviour of transparent latches. Further, identification of entire switching paths that need to operate more quickly is necessary when automatic redesign is intended. Hitchcock [S] gave a method for the analysis of clusters of combinational logic with assorted assertion times at the inputs and closure times at the outputs. The method identifies entire paths that are too slow. Hitchcock’s method has been used for combinational logic analysis in the work described here. The rising and falling advantage of calculating, separately signal settling time was discussed in i 71. This technique is also used in the work described here. More recently, Wallace and Sequin [8] and Szymanski [9] have considered all voltage transitions to result from transitions at primary inputs. The times of internal transitions are found by tracing forward. Relaxation results when a network contains directed cycles. 26th ACM/IEEE Design Automation Conference@ Paper 39.2 0 1989 ACM O-89791 -31 O-8/89/0006/0655 $1.50 655 Transparent latches can be correctly han’dled, and so can multiphase architectures. For analysis ‘of systems in which node volta es are updated more than once durin each clock period ‘t 81 attributes each transit ion to a clot a edge. A number of settling times are thus computed for each node. In Section 7 we will show how, with a little pre-processing, the number of settling times that must be calculated for each node may be minimised. Even when combinational logic inputs come from latches controlled by two or three different clock phases, a single settling time is often sufficient to represent the timing at eac6 node. The approach taken by Jouppi [lo], and in the work reported here, for the analysis of combinational logic n&works, is to assume that voltage transitions at ill nodes result from transitions at synchronising element outputs, and that there is a closure time at each synchronising element input. These times constrain the delay allowed in each combinational logic path. The t,ransparent latch problem manifests itself as uncertainty, for each latch, in the input closure time and output assertion time. Jouppi presented a scheme to handle systems in which non-overlapping multi-phase clocks control transparent latches. Our algorithms are based upon a systematic approach in which arbitrary clocking scheme are modelled correctly. Problem 3 Definition The following assumptions are made concerning the network of combinational logic and synchronising elements to be analysed: l l l l Across all switching elements data flow.is input terminals to output terminals; There are no directed cycles within of combinational logic; from any portions All synchronising elements have the following three terminals: Data input; Control input; Data output. The data input determines the output value, the control input signal determines the output timing. The signal connected to the control input of every synchronising element is a monotonic combinational logic function of exactly one clock signal. This means that the control signal, when enabled, always switches in the same direction as the clock signal, or else always switches in the opposite direction. Synchronising elements with further terminals (control-bar and outDut-bar\ can be handled. but for burposes of explanatibn we &ill consider only the: three terminals given above. In addition to the assumptions concerning the network, we assume that the operation is synchronous. By this, we mean that all clock waveforms have harmonically related frequencies, and there is an overall period which is an integer multiple of the period of each clock signal. We will define the system timin analysis problem in terms of a comparison between t1 e behaviour of a system and its “intended behaviour”. The intended behaviour is defined as the behaviour exhibited by an ideal system comprising the same network topolo y and controlled by the same clock signals, but in .whic Fg: l l All synchronising All combinational logic switching paths from clock signal sources to the control inputs of synchronising elements switch with zero delay; Paper 39.2 656 elements switch with zero delay; l lo ic switching All other combinational switch with arbitrarily sma 8 , but finite, (delays tending to zero). aths BeIays This definition of intended behaviour is a little different to Szymanski’s [9] “correct behaviour”, which is that exhibited when the clock is run slowly enough. The usual timing analysis task is to find tight upper bounds upon settling times of the transitions at all network nodes. The settling times are often referred to as the signal ready times. Required times can also be found. Required times can be traced backwards across network components in just the same ‘way that ready times can be traced forwards. For any combinational logic path from node z to node y the difference between the required time at node y and the signal ready time at node x gives an u per bound constraint upon the path propaga,tlon delay Psum of component propagation delays). If the constraint is satisfied, then the path is fast enough. Otherwise the path is too slow. Notice that speeding up a path that is already fast enough can not reduce the size of violations of any of these constraints, However, speeding up a path that is too slow will always reduce the size of a violation. An interesting feature of the definition of “too slow” is that it may apply to a set of combinational logic paths that form a directed cycle traversing two, or more, transparent latches. The algorithms to be presented solve the following problem: Given a network of combinational logic and synchronising elements, conforming to the assumptions given above, and descriptions of the clock signals: i) Find all paths that are too slow; ii) For all ready times that path, nodes in paths that are too slow find the and required times. For all other nodes find between the ready and required times such for any two nodes in a combinational logic the difference exceeds the path delay. For any combinational logic path (or portion of) the times generated indicate the speed-up required to make a slow path just fast enough, or else bound the degree to which a path may be slowed down. 4 Approach In this section we present a number of definitions and a proposition that forms the basis of our slow-path algorithm. Consider the output terminal of a synchronising element. The time at which the outDut signal aooears. I1 or is updated, is its assertion time. At the data input of a synchronising element, the time after which an input transition fails to be of use is the input closure time. At the control input of a synchronising element there are (potentially) two voltage transitions for each pulse of the controlling clock signal. The arrival time of the control transition that causes output assertion is the assertion control time. The arrival time of the control transition that causes input closure is the closure control time. In reality these may, or may not, be the same time. For a trailing edge triggered latch, for example, the trailing edge of the control signal causes both input closure and out ut assertion. However, for a level sensitive transparent P latch the leading edge of the control signa \ causes output assertion and the trailing edge causes input closure. The corresponding times m the I associated ideal system are called, respectively, ideal assertion time, ideal closure time, ideal assertion control time and ideal closure control time. eleWe use a &genericn model of a synchronising ment, that is controlled by a single clock pulse during each overall clock period. A synchronising element that is clocked at a frequency that is a multiple, n, of the overall clock frequency is represented by n such elements connected in parallel. To each is assigned one of the n clock pulses that occurs during every overall clock period. In this way all n sets of associated input closure, output assertion and control times are represented. Transitions at clock generator output terminals are the clock edge times. These times are the same in the actual system and in the associated ideal system. For a combinational logic path p from synchronising element output z to synchronising element data input y we use the following definition. The ideal path constraint of path p, D,, is given by the time that elapses between the ideal assertion time at x and the very next ideal closure time at y. A control path is a combinational logic path from the output of a clock signal generator to the control innut of a svnchronisine element. For a control nath from clock generator teFmina1 x to control input y‘ the ideal path constraint, DP, is the time that elapses between each controlling clock transition and the ideal closure control or assertion control time. For all control paths D, is identically zero. An enable path is a combinational logic path from a synchronising element output to a synchronising element control input. For an enable path from terminal z to terminal y, of synchronising element u, the ideal path constraint is the time that elapses between the ideal assertion time at z and one of the following two transitions of the clock signal that controls (J. The nature the operation of the synchronising element, and of the enable logic determines which of the clock edges is to be enabled/disabled. Here are two example ideal path constraints: a) p is a combinational logic path from the output of level sensitive latch CY,synchronised by $,, to the data input of level sensitive latch p, synchromsed by 4~. Dp is the time between a leading edge of&c* and the next trailing 4p edge. b) q is a combinational lo ic path from the output of trailing edge triggered late a y, synchronised by &, to the data input of trailing edge triggered latch 6, synchronised by 46. D, is the time between a trailing edge of q$ and the next traiting $6 edge. A special case of this configuration is when 46 = &. In this case D, is equal to exactly one 46 clock period. Figure 2(a) shows the model of a synchronising element. Notice that the model has two control inputs, two data inputs and two outputs. The two control innuts renresent the different control functions of innut Llosure.and output assertion. The two data inputs rkpresent the different input closure times that result from closure control and output assertion. The two outputs represent the different output assertion times that result from assertion control and input timing. Assertion time at the actual output is given by the maximum of the two output assertion times. Closure time at the actual input is given by the minimum of the two input closure times, A real number is associated with every terminal. These are the terminal offsets. Their meanings are as follows. O& and 0 Id give the input closure time and corresponding output assertion time. O,, gives the latest assertion control time required to achieve output assertion at time O,,. Ode gives the input closure time Input Control Input Control I (4 Figure 2: a) Timing Model Of Synchronising b) Simplified Model. Element. corresponding to closure control at time O,,. O,,, O,, and Oz,j specify absolute times (within the overall clock period) as offsets with respect to the ideal output assertion time. O,,, Ode and O& are offsets with respect to the ideal input closure time. Our algorithms use the slightly simplified synchronising element model of Fi ure 2(b). O,, is set to the constant value of zero, whit fl is a lower bound upon the closure control time. O& is set to the constant value - Dsetup, the required set-up time of the element. This guarantees that (min(O&, O& ) will be a lower bound upon the input closure time. T L e remaining four offsets are as described above. The behaviour and internal delays of any given synchronising element impose constraints upon the offsets. These take the form of upper and lower bounds, and of equalities involving associated pairs of offsets, and are called the synchronising element constraints. A description the constraints for edge triggered and transparent latches, together with an example, is given in section 5. There are also constraints associated with each combinational logic nath. These involve offsets of different synchronisini eliments. Let p be a combinational logic path from synchronising element output F, with offset O,(a), to synchronising element input y, wrth offset 0,. Let &nut,, be the largest propagation delay that can occur between the start and end of the path. dmax, is the sum of the worst (largest) component propagation delays. For path p we have the following path cons traint. dmax, < D, - 0, -+ 0, There is one further set of timing constraints. Consider synchronising element ,B, controlled by clock signal 4~ with period Tp. For the system to behave as intended, the signal at the data input, y, must not be updated more than Tp before the input closure time. We can represent this requirement with a second constraint for each combinational logic path ending at y. For a path from terminal 3: to y, with minimum path delay dminp, the supplementary path constraint is dminp > Dp - 0, -t 0, - Tp If there exists a solution to all of these constraints (synchronising element constraints, paths constraints and supplementary path constraints) then the system works as intended. Paper 39.2 657 With respect to a combinational logic path :from a synchronismg element output to a synchronising element input the following holds: L-- adz Let X be the set of all combinations of oflsets that satisfy the synchronising element constraints. Combina tional logic path p is too slow if, and only if, either: Vx E X, straint p does not satisfy Ddz 0 td --I its path con- 3 a set of paths Q = {q1,q2..+} s.t. Vx E X if p satisfies its path constraint then some q E Q does not, and if all q E Q satisfy their path constraints then p does not. To illustrate this proposition we will discuss a network in which a transparent latch is interposed between two portions of combinational logic. Suppose that by setting the latch input closure and output assertion times at the end of the control pulse, the first portion of logic meets the resulting timing constraints but th#at the second portion does not. Then suppose that latch input closure and output assertion times are moved towards the beginning of the control. Eventually the second portion of logic meets the corresponding timing constraints, but by this time the first portion no longer does. Both paths are then too slow, by the second condition of the proposition. In large networks of CMOS logic, for which the behavioural assumption made in this paper apply, timing problems are almost always due to paths that are too slow. However, even if all paths are fast enough the system may not work as intended due to a non-satisfied supplementary path constraints, resulting from badly asymmetric control path delays (eg. clock skew). Our algorithms do not detect these problems. 5 Synchronising els Element Mod- In the previous section we introduced the genera1 model of a synchronising element. In this section we present the details of the ed e tri gered and transparent latch models used in our a~gorit\ms. Trailing edge triggered latch: A trailing edge triggered latch latches its data input and updates its output on the trailing edge of each control pul.se. In other words, both input closure and output assertion are controlled by the trailing edge of each control pulse and all four offsets are specrfied with respect to this edge. For this synchronising element, the timing of the data input and output are independent. This is modeled by setting O& to zero, so that min(Odc,Odz) = O&, and by setting OZd to zero so that max(O,,, Old) = O,,. Let Dsetup be the data set-up time and D,, be the delay between the control input and the output, D setup, Dcz 1 0. Th e constraints upon the offsets are: Ode = -Dsetup; o& = 0; 0 zd = 0. Oat 1 0; 0 zc = Oat + Dcz; Transparent latch: During each pulse at the control input of a level sensitive latch data may flow from the input to the output. On the trailing edge of the control pulse the input is latched and the output remains static between control pulses. The ideal input closure Paper 39.2 658 , lea ing a of iEgZ;ol P or - input closure time I oAtput assertion time itime tra’ling d of ‘:o’&ol pu Pse Figure 3: Relationship Between Offsets For A “Transparent” Synchronising Element. time is the time of the trailing edge of the clock pulse and the ideal output assertion time is the time of the leading edge. Let Dsetup, Ddz and D,, be the data set-up time, and the delays from the data and control inputs to the output and let W be the width of the control pulse D&, D,.., W 2 0). The constraints upon the (D&up) offsets are listed below, The last condition is shown graphically in Figure 3. ode = -Dsetup; adz < -Ddt; ozd ae > > 0; 0; ox o,d = = oat + 13x; w + adz + Ddr . As an example, consider a transparent latch, with no internal delays, controlled during each clock period by a 2Ons clock pulse. Suppose the output is asserted 5ns after the beginning of the control pulse, then O,d = 5ns If there is a delay of 2ns between and O& = -15ns. the clock source and the control input of the latch then o,, = o,, = 272s. These four offsets are consistent with the constraints of the model. Clocked tristate drivers are modeled. in the same way as transparent latches. 6 Algorithms The algorithms presented here find all of the slow paths in a network conforming to the assumptions given in Section 3, and generate timing constraints, as defined in Section 3, to guide logic re-synthesis. For combinational logic path p from synchronising element output x, with offset O,, to synchronising element input y, with offset 0,, and having ideal path constraint D,, path slack is definded as 11, - 0, + 0, - dP. This is the amount by which the path constraint is satisfied. For a path that does not satisfy its path constraint the path slack is negative. Let P be the set of combinational logic paths that emanate from (converge to) synchronisin element terminal t. The node slack, nt, at t is l efined as the mmpEpsp, where sP is the path slack of path p. If x is a set of offsets for which the node slack, nt, at node, t, is positive, then the path constraints of all paths in P are satisfied. If t is a synchronising element input and the associated offset is decreased by any positive 6 < nt then the path constraints of all paths in P will still be satisfied. If t is a synchronising element output and the associated offset is increased by any positive 6 < st then the path constraints of all paths in P will still be satisfied. If the offset is adjusted by exactly st then, for one path in P the path constraint will n.ot be satisfied as there will be an exact equality. If the offset is adjusted by more than st, then for at least one path in P the path constraint will not be satisfied, nor will there be exact equality. We next define the operations of “complete” and These may be seen as the “partial” “slack transfer”. donation of spare time (possibly all of it, in the case of complete slack transfer) by one combinational logic path to an adjacent one. Let x be an in ut and y be the corresponding output of a simplified synchronising element model. Let 0, an 6 0, be t 1 e associated offsets (which satisfy the synchronising element conForward slack transfer is achieved by destraints). creasing 0, and 0, by equal positive quantities. Backward slack transfer is achieved by increasing 0, and 0, by equal positive quantities. Let rz, be the node slack at terminal t and m be the maximum decrease allowed in 0, and 0, by the synchronising element conComplete forward slack transfer is the prostraints. cess of decreasing the offsets by min(n,, m), provided min(n,, m) > 0. Partial forward slack transfer is the process of decreasing the offsets by min((n,)/n, m), provided min( n,)/ IZ, m) > 0, where n is any real number > 1. Comp \ ete and partial backward slack transfer are similarly defined. If the relevant inequality does not hold, then the offsets are not adjusted and we say no slack is transferred. If we take any set of offsets that satisfy all synchronising element constraints and let S be the set of paths that satisfy the corresponding path constraints, or for which strict equality holds, then if any one of these slack transfer operartions is performed and S’ is the corresponding new set of paths, then S’ > S. Iterations 1 and 2 of Algorithm 1 remove surplus time from paths with posrtive slack, leaving nonnegative slacks. Iterations 3 and 4 return some time to all paths that are fast enough, so that they end up with (strictly) positive slacks. All nodes in paths that are too slow end up with non-positive slacks. Because of the simplified synchronisin element model used, nodes in paths that are marginal f y fast enough may be identified as too slow. Iterations 1 and 2 each complete in a number of cycles at most one more than the number of synchronising elements in a directed path, typically less that ten. For the description of Algorithm 2 we define one further type of time transfer operation. Slack is “snatched” when a combinational logic path takes time, that it needs, from an adjacent path, regardless of whether the adjacent path can spare it. Precisely, let z be an input and y the corresponding output of a synchronising element. Let 0, and 0, be the associated offsets (that satisfy the synchronising element constraint). Let nv be the node slack at terminal y and m be the maximum decrease allowed in 0, and 0, by the synchronising element constraints. If n,, is negative, then forward is achieved by decreasing the offsets time snatching m). If min(-n,,m) > 0 then the offsets by min(-n,, are not adjusted and we say that no time is snatched. defined. Backward time snatching is similarly Iteration 1 of Algorithm 2 traces signal ready times forward through the network, stopping when the actual times have been found for nodes in paths that are too slow. Iteration 2 traces required times backwards and stops when the actual times have been found for nodes in paths that are too slow. For each node not in a path that is too slow the times generated are an upper bound on the ready time, and lower bound on the required time such that the former is smaller that the latter. Each iteration completes in a number of cycles at most one more than the number of synchronising element in a directed path. Algorithm 1 (Identification of Slow Paths) Initialise: Select any set of offsets satisfying the synchronising element constraints. Iteration 1: la) Find node slack at every synchronising element terminals. lb) If all slacks > 0 then stop (system behaves as intended). lc) Perform complete forward slack transfer across all synchronising elements. I Id If no slack was transferred then go to iteration 2. le 1 Go to la. Iteration 2: 2a) Fzifa;ode slack at every synchronising element ter. 2b) If all slacks > 0 then stop (system behaves as intended). 2c) Perform complete backward slack transfer across all synchronising elements. 2d If no slack was transferred then go to iteration 3. 2e1 Go to 2a. Iteration 3 - Repeat once for each complete backward 1iteration made: 3a) Fkita;ode slack at every synchronising element terforward slack transfer across all 3b) Perfor.m partial synchronising elements. Iteration 4 - Repeat once for each complete forward iteration made: 4a) F$ia;ode slack at every synchronising element terbackward 4b) Perform partial all synchronising elements. , Final step: Find all node slacks. Algorithm 2 (Timing Constraint slack transfer across Generation) Initialise: Use AIgorithm 1 to generate initial offsets. Iteration 1 la) Fzia;,ode slack at every synchronising element tertime backward across all synchronising elements. lc) If any time was snatched then go to la. Record ready times at all cell inputs. I Iteration 2 2a) F$ta;ode slack at every synchronising element terlb) Snatch 2b) Snatch time forward elements. 2c) If any time was snatched Record required times across all synchronising then go to 2a. at all cell outputs. I Paper 39.2 659 7 Slack Computations clock period The algorithms presented in Section 6 make use of slack values at synchronising element terminal nodes. These could be calculated directly, as defined. Such a path enumeration procedure is computationally expensive. Hitchcock [6] introduced the much faster block method. The disadvantage of the block method i.s that “false paths” (i.e. paths that that can not actually be sensitised) can not be discarded, and so the generated propagation delays and slacks tend to be pessimistic. Pessimist ic slacks (i.e. too small) are safe, however. As speed is an important issue for a system timing analyser to be used in an analysis-redesign loop, and as we want to provide timing analysis for systems at arbitrary levels of abstraction (not just at the level of the most primitive logic gates) we decided to use the straight block analysis method for our slack computations. is defined as a maximal connected netA cluster work of combinational logic elements. All inputs to a cluster are synchronising element outputs and all outputs from a cluster are synchronising element inputs. Synchronising element offsets specify assertion times at cluster inputs and closure times at cluster outputs. We start by calling the cluster input assertion times node ready times and calculate ready times for all of remaining cluster nodes using equation 1. The assumption that there are no directed cycles within any portions of combinational logic guarantees that all of the ready times can be calculated in this way. We calculate a slack value at each cluster output (synchronising element input) as the difference between the closure time and the ready time. Slacks for the remaining nodes in the cluster are then calculated using equation I!. At each circuit node the data required time is given by the node ready time plus the node slack. (1) (2) Ready-time and slack evaluation formulae. I - component inputs, Z - component outputs, R - ready time, S 1 - slack, P - input to output propagation delay. For a cluster within a system incorporating assorted types of synchronising elements, and controlled by different clocks, the cluster input assertion times and output closure times are defined as offsets with respect to various different reference times (the ideal assertion and closure times). In order to perform the above cluster analysis, it is necessary for all of these times to be known with reference to the same point in time. We can visualise the process of converting all of the values to offsets from the same reference point as follows: First it is necessary to “break openn the clock period. This will produce an interval of time, of one overall period in length, in which the locations of all of the ideal assertion and closure times (i.e. clock edges are well defined and into which we can easily place a l)1 of the input assertion times and output closure times. We then choose any point for use as a common reference time against which to state all of the assertion and closure times. The problem lies in deciding where to break open the clock cycle. A bad choice results in path always having an input assertion time after its output closure time. To decide where to break open the clock cycle it is necessary to look at the assumption of intended behaviour and the relationships that this imposes between pairs of clock Paper 39.2 660 Figure 4: Directed Graph Example. a) Clock waveforms; b) Directed graph representing clock; c) Cluster annotated with ideal input assertion times and ideal output closure times; d) Directed graph completed for this cluster. edges. From the assumption of intended behaviour we know that: i) control paths have ideal path constraints of exactly zero; ii) all other paths have ideal path delays that are strictly positive and equal to at Imost one overall clock period. It is not always possible to break open the clock cycle so that the resulting times between ideal input assertion time and ideal output closure time is non-negative for all paths through the cluster. Figure 1 gave a simple case in which two cluster analysis passes are required. As a pre-processing stage, we decide how many analysis passes are needed for each cluster, where to break open the clock period for each pass and for each cluster output, which analysis applies. During an analysis that does not apply to a specific output we set the node slack to a large number before performing the slack calculations. Having completed all of the necessary passes, the smallest slack seen at each node is the required node slack. We next present, an algorithm for selecting a set, of smallest nossible size. of nasses. necc?ssarv to find all slacks. A dcrected graph is lonstructed to represent the sequence of occurrence of the clock edges. Parts (a) and (b) of Figure 4 show a set of clock waveforms and the corresponding graph. Each of the ways in which it is possible to break open the clock perio,d is represented by the removal of a single arc. Next we consider all cluster input-output combmations between which switching paths exist (other than control paths) and represent the clock edge ordering required (for the ideal path constraints to be strictly positive) by adding extra arcs to the graph. Figure 4(c) shows an example of a small cluster annotated with the names of the clock edges that are the ideal assertion and closure times. Figure 4(d) gives the completed directed graph for this cluster. A broken open clock period that satisfies the requirement represented by an extra arc is one that is represented by the removal of an orig’inal arc that appears after the head, and before the tail, of the extra arc. For example, the requirement that edge E occur before edge C is satisfied by the broken open period represented by removal of the original1 arc from node D to node E. The clock edges then occur in the order E - F - G - N - A - B - C - 43, in which edge E is before edge C. The minimum sized set of analysis passes required is represented by the minimum sized set of arcs that has to be removed from the underlyin clock graph so that one member lies between the he a3 Ugorithm 3 (Analysis-Redesign iynthesise initial area optimised nodules. Until all paths are fast enough: Loop) combinational logic Perform timing analysis to identify all paths that are too slow; Provide input data ready times and output required times for all combinational logic modules traversed by paths that are too slow; Select one such module and speed up slow paths. and tail of every added arc. We find such a set by exhaustive search of the graph, staring with all removal of each single original arc, then we try all possible pairs, and so on until the above condition is satisfied. The graphs are usually small and very seldom is it necessary to remove more than two arcs. For each cluster output we find the broken open clock period within which its ideal closure time appears closest to the end. The output node slack is calculated during the corresponding cluster analysis pass. 8 Implementation and Results The algorithms described here have been implemented in the computer program “Hummingbird”, written using the ‘C‘ programming langua e, and which interfaces with other programs in the Ber P;eley Synthesis System via the OCT data base. All experiments so far have been with networks of standard cells. Propagation delays for the standard cells have been estimated using delay evaluation expressions that take into account the connected loads. For combinational logic modules the delays have been combined to generate estimates of the module propagation delays. Hummingbird has an interactive mode in which, for example, changes may be made to the shapes of the clock waveforms to determine the effect on system timing. Adjustments may also be made to component delays, One option that users have is to flag all slow paths in the OCT data base. If the design has been placed and routed, the slow paths may then be viewed during a VEM graphical editing session. Algorithm 3 shows how we propose to automate the analysls/re-design process. Singh et al. [l] have shown how to choose the combinational logic module that has most potential for speed up to meet timing constraints, and also how to achieve the speed up. Table I shows the run times for a number of examples. DES is a complete data encryption chip, made up from 3681 standard cells. ALU is a portion of a CPU chip made up from 899 standard cells. SMlF a 12 bit finite state machine described as a *flattened” network of standard cells. SMlH is a “hierarchical” description of the same machine in which the combinational logic is contained in a single module. Pre-processing times include the times taken for generating combinational logic clusters and for performing the algorithm described in Section 7. The analysis times are the times taken to perform Algorithm 1. Data input and output times are not shown. We point out that the number of iterations required, and hence the run times, depend upon the specified clock speeds. 9 Conclusion We have described the need for system timin analysis in a logic synthesis environment and have note li the need to correctly model static CMOS logic synchronised by complicated multi-frequency clocking schemes. A new urn. Nets 89 304 11 Table 1: Run times in VAX 8800 cpu seconds. systematic approach to system timing analysis has been proposed and fast new algorithms have been presented. The algorithms identify all path that are “too slow” and provide timin constraints for use by a combina tional logic re-synt a esis program. A new feature is that the minimum number of voltage settling times are calculated for nodes of combinational networks with input transitions controlled by different clock signals. The algorithms presented have been implemented in the computer program Hummingbird. Run-time statistics have been provided, and indicate that the method is, indeed, very fast. Acknowledgement Early discussions with Professor Robert Brayton were influential in determining the direction this research. The authors also wish to acknowledge many useful conversations with Kanwar Jit Singh and with Gary Gannot, of Intel. This research has been funded from D.A.R.P.A. grant, N00039-87-C0182, by the MICRO program of the State of California, Hughes Aircraft, Intel and Rockwell. References [I] K. J. Singh, A. R. Wang, R. K. Brayton, and A. Saneiovanni-Vincentelli. Timinn ootimization of com1inational logic. In Interna‘iioial Conference On Computer-Aided Design, IEEE, 1988. [2] J. K. Ousterhout. A switch-level timing verifier for on Computerdigital mos vlsi. IEEE Transactions Aided Design, CAD-4, No.3, July 1984. [3] S. H. Hwang, Y. H. Kim, and A. R. Newton. An accurate delay modeling technique for switch-level timing verification. In .2&d Design Automation Conference, ACM IEEE, 1986. synthesis in [4] G. De Micheli. P er f ormance-oriented the yorktown silicon compiler. In bternational Conference On Computer-Aided Design, IEEE, 1986. [5] T. M. McWilliams. Verification of timing constraints on large digital systems. In 17th Design Automation Conference, ACM IEEE, 1980. [6] R. B. Hitchcock, Sr. Timing verification and the timing analysis program. In 19th Design Automation Conference, ACM IEEE, 1982. [7] L. C. Bening, A. L. Alexander, and J. E. Smith. Develonments in logic network nath delav analv‘ConjeFenEk, ACb sis. In-19th Design 2utomation IEEE, 1982. [8] D. E. Wallace and C. H. Sequin. Atv: an abstract timing verifier. In .Zth Design Automation ConfeTence, ACM IEEE, 1988. [9] T. G. Szymanski. Leadout: a static timin analyzer for mos circuits. In International Con Bewnce On Computer-Aided Design, IEEE, 1986. [lo] N. P. Jouppi. Timing analysis for nmos vlsi. In 20th Design Automation Conference, ACM IEEE, 1983. Paper 39.2 661