Timing
Analysis
In A Logic
Synthesis
Environment
Nicholas Weiner and Albert0
Sangiovanni-Vincentelli
Department
of Electrical
Engineering
and Computer
Sciences
University
of California,
Berkeley, California
94720
Abstract
A goal of a logic synthesis system is the automatic generation of area optimised designs that meet timing requirements.
The desi n process involves repeated timing analyses followed %y appropriate modifications.
We present fast new algorithms for system level timing analysis and for the generation of timing constraints
to guide the re-design of portions of combinational
logic.
Our systematic approach correctly models designs that
incorporate
level sensitive latches controlled by multifrequency, as well as simple multi-phase,
clocks. A new
feature is that the minimum number of settling times
are evaluated for the nodes of combinational
networks
with input transitions
controlled by different clock signals.
The computer program Hummingbird
uses the algorithms presented. Hummingbird
interfaces with other
programs in the Berkeley Synthesis System through the
OCT data base. For a digital signal processing chip,
comprising 3681 standard cells, timing analysis is performed in 14.87 cpu seconds on a VAX 8800 running the
ULTRIX operating system.
1
Introduction
For logic synthesis systems, such as the Berkeley Synthesis System, designs are specified as high level descrip
tions of combinational
logic modules and of the interconnections between these modules and synchronising
elements clocked memory elements and drivers). Comgenerated
binationa 5 logic implementations
are initially
without re ard to timing requirements.
A timing analyser is nee 3 ed to consider both the design and the clock
waveforms, to determine where timing problems may
arise. To guide logic re-synthesis [l], the timing analyser
must also provide delay constraints for logic switching
paths.
Static CMOS standard cells provide one of the dominant design methodologies for automatic VLSI synthesis. We have oaid Darticular attention to la.rne networks
of such cells,- spe&fically,
to the necessity to correct1
model the behaviour of level sensitive (or “transparent” 3
Latches. From our experience with users of the Berkeley
Synthesis System, we have identified the need to avoid
assumptions concerning the clock waveforms, or the way
in which the clocks are used to synchronise the system.
Figure 1 shows a simple configuration
in which inputs
to a logic gate are updated at different times durin
the
clock period, In this example the output from the &ogic
gate is required to settle to two different valid states during each clock cycle. The logic gate is therefore “time
multiplexed
within each overall clock period”.
Permission to copy without fee all or pan of this material is granted provided
that the copies are not made or distributed for direct commercial advantage,
the ACM cnpyright notice and the title of the publication and its date appear,
and notice is given that copying is by permission of the Association for
Computing Machinery. To copy othetise,
or to republish, requires a fee
and/or specific permission.
r
logic H
Figure 1: Logic with latches controlled by four different
clock phases.
We make a distinction
between
component
propagation-delay
estimation
and system timing analysis. A component propagation-delay
is the time difference ‘between a voltage transition
at an input and the
resulting transition at an output. System timing analysis makes use of component delay estimates to determine
whether a composite system will behave uas intended”,
and if not, what the timing problems are.
Component delay estimation techniques may use approximate analytical or numerical methods [2,3,4]. For
standard cells, empirical delay estimation formulae are
often used. By separating component delay-estimation
and system-timing
analysis, different delay-estimation
methods may be combined.
In this paper we develop a systematic approach to
system-level timing-analysis.
Two algorithms
are presented that make use of estimates of maximum component propagation delays. The first finds all paths that
The second generates constraints
to
are %oo slow”.
The algorithms allow any set
guide logic re-synthesis.
freof clock signals, with any (harmonically
related
quencies and phase relationships,
used arbitrari 1’y for
synchronisation.
They also model transparent
latches
correctly. The assumptions that we make in Section 3
define the class of systems to which our analysis algorithms apply.
2
Related
Work
We briefly overview some of the related work in sys[5] presented
tem timing analysis. In 1980 McWilliams
a method m which portions of combinational
logic are
individually
analysed, and timing violations
at latch
inputs are reported.
The approach can handle complicated clocking schemes, but it can not model the
behaviour of transparent
latches.
Further, identification of entire switching
paths that need to operate
more quickly is necessary when automatic
redesign is
intended. Hitchcock [S] gave a method for the analysis
of clusters of combinational
logic with assorted assertion times at the inputs and closure times at the outputs. The method identifies entire paths that are too
slow. Hitchcock’s method has been used for combinational logic analysis in the work described here. The
rising and falling
advantage of calculating,
separately
signal settling time was discussed in i 71. This technique
is also used in the work described here.
More recently, Wallace and Sequin [8] and Szymanski [9] have considered all voltage transitions
to result
from transitions
at primary inputs.
The times of internal transitions
are found by tracing forward. Relaxation results when a network contains directed cycles.
26th ACM/IEEE Design Automation
Conference@
Paper 39.2
0 1989 ACM O-89791 -31 O-8/89/0006/0655
$1.50
655
Transparent
latches can be correctly han’dled, and so
can multiphase architectures.
For analysis ‘of systems in
which node volta es are updated more than once durin
each clock period ‘t 81 attributes each transit ion to a clot a
edge. A number of settling times are thus computed for
each node. In Section 7 we will show how, with a little
pre-processing, the number of settling times that must
be calculated for each node may be minimised.
Even
when combinational
logic inputs come from latches controlled by two or three different clock phases, a single
settling time is often sufficient to represent the timing
at eac6 node.
The approach taken by Jouppi [lo], and in the work
reported here, for the analysis of combinational
logic
n&works,
is to assume that voltage transitions
at ill
nodes result from transitions
at synchronising
element
outputs, and that there is a closure time at each synchronising element input. These times constrain the delay allowed in each combinational
logic path. The t,ransparent latch problem manifests itself as uncertainty,
for
each latch, in the input closure time and output assertion time.
Jouppi presented a scheme to handle
systems in which non-overlapping
multi-phase
clocks
control transparent
latches. Our algorithms
are based
upon a systematic approach in which arbitrary clocking
scheme are modelled correctly.
Problem
3
Definition
The following assumptions are made concerning the network of combinational
logic and synchronising
elements
to be analysed:
l
l
l
l
Across all switching
elements data flow.is
input terminals to output terminals;
There are no directed cycles within
of combinational
logic;
from
any portions
All synchronising
elements have the following
three terminals:
Data input; Control input; Data
output.
The data input determines the output
value, the control input signal determines the output timing.
The signal connected to the control input of every synchronising
element is a monotonic combinational logic function of exactly one clock signal.
This means that the control signal, when enabled,
always switches in the same direction as the clock
signal, or else always switches in the opposite direction.
Synchronising
elements
with
further
terminals
(control-bar
and outDut-bar\
can be handled. but for
burposes of explanatibn
we &ill consider only the: three
terminals given above.
In addition to the assumptions concerning the network, we assume that the operation is synchronous.
By
this, we mean that all clock waveforms have harmonically related frequencies, and there is an overall period
which is an integer multiple of the period of each clock
signal. We will define the system timin
analysis problem in terms of a comparison between t1 e behaviour of
a system and its “intended behaviour”.
The intended
behaviour
is defined as the behaviour exhibited by an
ideal system
comprising
the same network topolo y
and controlled by the same clock signals, but in .whic Fg:
l
l
All synchronising
All combinational
logic switching paths from clock
signal sources to the control inputs of synchronising elements switch with zero delay;
Paper 39.2
656
elements switch with zero delay;
l
lo ic switching
All other combinational
switch with arbitrarily
sma 8 , but finite,
(delays tending to zero).
aths
BeIays
This definition
of intended behaviour is a little different to Szymanski’s [9] “correct behaviour”,
which is
that exhibited when the clock is run slowly enough.
The usual timing analysis task is to find tight upper
bounds upon settling times of the transitions at all network nodes. The settling times are often referred to as
the signal ready times. Required
times can also be
found. Required times can be traced backwards across
network components in just the same ‘way that ready
times can be traced forwards.
For any combinational
logic path from node z to
node y the difference between the required time at node
y and the signal ready time at node x gives an u per
bound constraint upon the path propaga,tlon delay Psum
of component propagation
delays). If the constraint is
satisfied, then the path is fast enough.
Otherwise the
path is too slow. Notice that speeding up a path that is
already fast enough can not reduce the size of violations
of any of these constraints,
However, speeding up a path
that is too slow will always reduce the size of a violation.
An interesting feature of the definition of “too slow”
is that it may apply to a set of combinational
logic
paths that form a directed cycle traversing two, or more,
transparent latches.
The algorithms
to be presented solve the following
problem:
Given a network of combinational
logic and synchronising elements, conforming
to the assumptions
given
above, and descriptions of the clock signals:
i) Find all paths that are too slow;
ii)
For all
ready
times
that
path,
nodes in paths that are too slow find the
and required times. For all other nodes find
between the ready and required times such
for any two nodes in a combinational
logic
the difference exceeds the path delay.
For any combinational
logic path (or portion of) the
times generated indicate the speed-up required to make
a slow path just fast enough, or else bound the degree
to which a path may be slowed down.
4
Approach
In this section we present a number of definitions and a
proposition
that forms the basis of our slow-path algorithm.
Consider the output terminal of a synchronising
element. The time at which the outDut signal aooears.
I1
or is updated, is its assertion
time.
At the data input of a synchronising
element, the time after which an
input transition
fails to be of use is the input closure
time. At the control input of a synchronising
element
there are (potentially)
two voltage transitions
for each
pulse of the controlling
clock signal. The arrival time of
the control transition that causes output assertion is the
assertion
control
time. The arrival time of the control transition
that causes input closure is the closure
control
time. In reality these may, or may not, be the
same time. For a trailing edge triggered latch, for example, the trailing edge of the control signal causes both
input closure and out ut assertion. However, for a level
sensitive transparent P latch the leading edge of the control signa \ causes output assertion and the trailing edge
causes input closure. The corresponding
times m the
I
associated ideal system are called, respectively, ideal
assertion
time, ideal closure time, ideal assertion
control
time and ideal closure control
time.
eleWe use a &genericn model of a synchronising
ment, that is controlled by a single clock pulse during
each overall clock period. A synchronising element that
is clocked at a frequency that is a multiple, n, of the
overall clock frequency is represented by n such elements
connected in parallel. To each is assigned one of the n
clock pulses that occurs during every overall clock period. In this way all n sets of associated input closure,
output assertion and control times are represented.
Transitions at clock generator output terminals are
the clock edge times.
These times are the same in
the actual system and in the associated ideal system.
For a combinational
logic path p from synchronising
element output z to synchronising element data input y
we use the following definition.
The ideal path constraint
of path p, D,, is given by the time that elapses
between the ideal assertion time at x and the very next
ideal closure time at y.
A control
path is a combinational
logic path from
the output of a clock signal generator to the control
innut of a svnchronisine
element. For a control nath
from clock generator teFmina1 x to control input y‘ the
ideal path constraint,
DP, is the time that elapses
between each controlling
clock transition and the ideal
closure control or assertion control time. For all control
paths D, is identically zero.
An enable path is a combinational
logic path from
a synchronising
element output to a synchronising
element control input. For an enable path from terminal
z to terminal y, of synchronising
element u, the ideal
path constraint
is the time that elapses between the
ideal assertion time at z and one of the following two
transitions of the clock signal that controls (J. The nature the operation of the synchronising
element, and of
the enable logic determines which of the clock edges is
to be enabled/disabled.
Here are two example ideal path constraints:
a) p is a combinational
logic path from the output of
level sensitive latch CY,synchronised by $,, to the data
input of level sensitive latch p, synchromsed by 4~. Dp
is the time between a leading edge of&c* and the next
trailing 4p edge.
b) q is a combinational
lo ic path from the output
of trailing edge triggered late a y, synchronised by &,
to the data input of trailing edge triggered latch 6, synchronised by 46. D, is the time between a trailing edge
of q$ and the next traiting $6 edge. A special case of
this configuration
is when 46 = &. In this case D, is
equal to exactly one 46 clock period.
Figure 2(a) shows the model of a synchronising
element. Notice that the model has two control inputs,
two data inputs and two outputs. The two control innuts renresent the different control functions of innut
Llosure.and output assertion. The two data inputs rkpresent the different input closure times that result from
closure control and output assertion. The two outputs
represent the different output assertion times that result
from assertion control and input timing. Assertion time
at the actual output is given by the maximum of the
two output assertion times. Closure time at the actual
input is given by the minimum of the two input closure
times,
A real number is associated with every terminal.
These are the terminal offsets.
Their meanings are
as follows.
O& and 0 Id give the input closure time
and corresponding output assertion time. O,, gives the
latest assertion control time required to achieve output
assertion at time O,,. Ode gives the input closure time
Input
Control
Input
Control
I
(4
Figure 2: a) Timing Model Of Synchronising
b) Simplified Model.
Element.
corresponding to closure control at time O,,. O,,, O,,
and Oz,j specify absolute times (within the overall clock
period) as offsets with respect to the ideal output assertion time. O,,, Ode and O& are offsets with respect to
the ideal input closure time.
Our algorithms use the slightly simplified synchronising element model of Fi ure 2(b). O,, is set to the
constant value of zero, whit fl is a lower bound upon the
closure control time. O& is set to the constant value
- Dsetup, the required set-up time of the element. This
guarantees that (min(O&, O& ) will be a lower bound
upon the input closure time. T L e remaining four offsets
are as described above.
The behaviour and internal delays of any given synchronising element impose constraints upon the offsets.
These take the form of upper and lower bounds, and of
equalities involving associated pairs of offsets, and are
called the synchronising
element
constraints.
A
description the constraints for edge triggered and transparent latches, together with an example, is given in
section 5.
There are also constraints associated with each combinational logic nath. These involve offsets of different
synchronisini
eliments. Let p be a combinational
logic
path from synchronising
element output F, with offset
O,(a), to synchronising element input y, wrth offset 0,.
Let &nut,, be the largest propagation delay that can
occur between the start and end of the path. dmax, is
the sum of the worst (largest) component propagation
delays. For path p we have the following path cons traint.
dmax,
< D, - 0, -+ 0,
There is one further set of timing constraints. Consider synchronising
element ,B, controlled by clock signal 4~ with period Tp. For the system to behave as
intended, the signal at the data input, y, must not be
updated more than Tp before the input closure time. We
can represent this requirement with a second constraint
for each combinational
logic path ending at y. For a
path from terminal 3: to y, with minimum path delay
dminp, the supplementary
path constraint
is
dminp > Dp - 0, -t 0, - Tp
If there exists a solution to all of these constraints
(synchronising
element constraints,
paths constraints
and supplementary
path constraints)
then the system
works as intended.
Paper 39.2
657
With respect to a combinational
logic path :from
a synchronismg
element output to a synchronising
element input the following holds:
L--
adz
Let X be the set of all combinations of oflsets that satisfy the synchronising
element constraints.
Combina
tional logic path p is too slow if, and only if, either:
Vx E X,
straint
p does not satisfy
Ddz
0 td
--I
its path con-
3 a set of paths Q = {q1,q2..+}
s.t. Vx E
X if p satisfies its path constraint then some
q E Q does not, and if all q E Q satisfy their
path constraints then p does not.
To illustrate this proposition
we will discuss a network in which a transparent latch is interposed between
two portions of combinational
logic. Suppose that by
setting the latch input closure and output assertion
times at the end of the control pulse, the first portion of
logic meets the resulting timing constraints but th#at the
second portion does not. Then suppose that latch input
closure and output assertion times are moved towards
the beginning of the control. Eventually the second portion of logic meets the corresponding timing constraints,
but by this time the first portion no longer does. Both
paths are then too slow, by the second condition of the
proposition.
In large networks of CMOS logic, for which the behavioural assumption made in this paper apply, timing
problems are almost always due to paths that are too
slow. However, even if all paths are fast enough the
system may not work as intended due to a non-satisfied
supplementary
path constraints,
resulting from badly
asymmetric control path delays (eg. clock skew). Our
algorithms do not detect these problems.
5
Synchronising
els
Element
Mod-
In the previous section we introduced the genera1 model
of a synchronising
element. In this section we present
the details of the ed e tri gered and transparent latch
models used in our a~gorit\ms.
Trailing
edge triggered
latch:
A trailing edge
triggered latch latches its data input and updates its
output on the trailing edge of each control pul.se. In
other words, both input closure and output assertion are
controlled by the trailing edge of each control pulse and
all four offsets are specrfied with respect to this edge.
For this synchronising
element, the timing of the data
input and output are independent.
This is modeled by
setting O& to zero, so that min(Odc,Odz)
= O&, and
by setting OZd to zero so that max(O,,, Old) = O,,.
Let Dsetup be the data set-up time and D,, be
the delay between the control input and the output,
D setup, Dcz 1 0. Th e constraints upon the offsets are:
Ode = -Dsetup;
o& = 0;
0 zd = 0.
Oat 1 0;
0 zc = Oat + Dcz;
Transparent
latch: During each pulse at the control input of a level sensitive latch data may flow from
the input to the output. On the trailing edge of the control pulse the input is latched and the output remains
static between control pulses. The ideal input closure
Paper 39.2
658
,
lea ing
a
of iEgZ;ol
P
or
-
input
closure
time
I
oAtput
assertion
time
itime
tra’ling
d
of ‘:o’&ol
pu Pse
Figure 3: Relationship
Between Offsets For A “Transparent” Synchronising
Element.
time is the time of the trailing edge of the clock pulse
and the ideal output assertion time is the time of the
leading edge.
Let Dsetup, Ddz and D,, be the data set-up time,
and the delays from the data and control inputs to the
output and let W be the width of the control pulse
D&, D,.., W 2 0). The constraints upon the
(D&up)
offsets are listed below, The last condition
is shown
graphically in Figure 3.
ode
=
-Dsetup;
adz
<
-Ddt;
ozd
ae
>
>
0;
0;
ox
o,d
=
=
oat
+ 13x;
w + adz
+
Ddr
.
As an example, consider a transparent latch, with no
internal delays, controlled during each clock period by
a 2Ons clock pulse. Suppose the output is asserted 5ns
after the beginning of the control pulse, then O,d = 5ns
If there is a delay of 2ns between
and O& = -15ns.
the clock source and the control input of the latch then
o,, = o,, = 272s. These four offsets are consistent with
the constraints of the model.
Clocked tristate drivers are modeled. in the same way
as transparent latches.
6
Algorithms
The algorithms presented here find all of the slow paths
in a network conforming
to the assumptions given in
Section 3, and generate timing constraints,
as defined
in Section 3, to guide logic re-synthesis.
For combinational
logic path p from synchronising
element output x, with offset O,, to synchronising
element input y, with offset 0,, and having ideal path constraint D,, path slack is definded as 11, - 0, + 0, - dP.
This is the amount by which the path constraint is satisfied. For a path that does not satisfy its path constraint
the path slack is negative.
Let P be the set of combinational
logic paths that
emanate from (converge to) synchronisin
element terminal t. The node slack, nt, at t is l efined as the
mmpEpsp, where sP is the path slack of path p. If x is
a set of offsets for which the node slack, nt, at node, t,
is positive, then the path constraints of all paths in P
are satisfied. If t is a synchronising
element input and
the associated offset is decreased by any positive 6 < nt
then the path constraints of all paths in P will still be
satisfied. If t is a synchronising
element output and the
associated offset is increased by any positive 6 < st then
the path constraints of all paths in P will still be satisfied. If the offset is adjusted by exactly st then, for one
path in P the path constraint
will n.ot be satisfied as
there will be an exact equality. If the offset is adjusted
by more than st, then for at least one path in P the
path constraint will not be satisfied, nor will there be
exact equality.
We next define the operations of “complete”
and
These may be seen as the
“partial”
“slack transfer”.
donation of spare time (possibly all of it, in the case
of complete slack transfer) by one combinational
logic
path to an adjacent one. Let x be an in ut and y be
the corresponding
output of a simplified
synchronising element model. Let 0, an 6 0, be t 1 e associated
offsets (which satisfy the synchronising
element conForward
slack transfer
is achieved by destraints).
creasing 0, and 0, by equal positive quantities. Backward
slack transfer
is achieved by increasing 0, and
0, by equal positive quantities.
Let rz, be the node
slack at terminal t and m be the maximum decrease
allowed in 0, and 0, by the synchronising element conComplete
forward slack transfer is the prostraints.
cess of decreasing the offsets by min(n,, m), provided
min(n,, m) > 0. Partial
forward slack transfer is the
process of decreasing the offsets by min((n,)/n,
m), provided min( n,)/ IZ, m) > 0, where n is any real number
> 1. Comp \ ete and partial backward slack transfer are
similarly defined.
If the relevant inequality
does not
hold, then the offsets are not adjusted and we say no
slack is transferred.
If we take any set of offsets that
satisfy all synchronising element constraints and let S be
the set of paths that satisfy the corresponding path constraints, or for which strict equality holds, then if any
one of these slack transfer operartions is performed and
S’ is the corresponding
new set of paths, then S’ > S.
Iterations
1 and 2 of Algorithm
1 remove surplus time from paths with posrtive slack, leaving nonnegative slacks. Iterations 3 and 4 return some time to
all paths that are fast enough, so that they end up with
(strictly)
positive slacks. All nodes in paths that are
too slow end up with non-positive
slacks. Because of
the simplified synchronisin
element model used, nodes
in paths that are marginal f y fast enough may be identified as too slow. Iterations 1 and 2 each complete in a
number of cycles at most one more than the number of
synchronising elements in a directed path, typically less
that ten.
For the description of Algorithm 2 we define one further type of time transfer operation. Slack is “snatched”
when a combinational
logic path takes time, that it
needs, from an adjacent path, regardless of whether the
adjacent path can spare it. Precisely, let z be an input
and y the corresponding
output of a synchronising
element. Let 0, and 0, be the associated offsets (that
satisfy the synchronising
element constraint).
Let nv
be the node slack at terminal y and m be the maximum
decrease allowed in 0, and 0, by the synchronising
element constraints.
If n,, is negative, then forward
is achieved by decreasing the offsets
time
snatching
m).
If min(-n,,m)
> 0 then the offsets
by min(-n,,
are not adjusted and we say that no time is snatched.
defined.
Backward
time
snatching
is similarly
Iteration 1 of Algorithm
2 traces signal ready times
forward through the network, stopping when the actual
times have been found for nodes in paths that are too
slow. Iteration 2 traces required times backwards and
stops when the actual times have been found for nodes
in paths that are too slow. For each node not in a path
that is too slow the times generated are an upper bound
on the ready time, and lower bound on the required time
such that the former is smaller that the latter.
Each
iteration completes in a number of cycles at most one
more than the number of synchronising
element in a
directed path.
Algorithm
1 (Identification
of Slow
Paths)
Initialise:
Select any set of offsets satisfying the synchronising
element constraints.
Iteration 1:
la) Find node slack at every synchronising element terminals.
lb) If all slacks > 0 then stop (system behaves as intended).
lc) Perform complete
forward
slack transfer across
all synchronising
elements.
I
Id If no slack was transferred then go to iteration 2.
le 1 Go to la.
Iteration 2:
2a) Fzifa;ode slack at every synchronising element ter.
2b) If all slacks > 0 then stop (system behaves as intended).
2c) Perform complete
backward
slack transfer across
all synchronising
elements.
2d If no slack was transferred then go to iteration 3.
2e1 Go to 2a.
Iteration 3 - Repeat once for each complete backward
1iteration made:
3a) Fkita;ode slack at every synchronising element terforward
slack transfer across all
3b) Perfor.m partial
synchronising
elements.
Iteration 4 - Repeat once for each complete forward iteration made:
4a) F$ia;ode
slack at every synchronising element terbackward
4b) Perform partial
all synchronising
elements.
, Final step:
Find all node slacks.
Algorithm
2 (Timing
Constraint
slack transfer
across
Generation)
Initialise:
Use AIgorithm
1 to generate initial offsets.
Iteration 1
la) Fzia;,ode slack at every synchronising element tertime
backward
across all synchronising
elements.
lc) If any time was snatched then go to la.
Record ready times at all cell inputs.
I Iteration 2
2a) F$ta;ode
slack at every synchronising element terlb)
Snatch
2b) Snatch
time
forward
elements.
2c) If any time was snatched
Record required
times
across all synchronising
then go to 2a.
at all cell outputs.
I
Paper 39.2
659
7
Slack
Computations
clock period
The algorithms presented in Section 6 make use of slack
values at synchronising
element terminal nodes. These
could be calculated directly, as defined. Such a path
enumeration
procedure is computationally
expensive.
Hitchcock [6] introduced the much faster block method.
The disadvantage of the block method i.s that “false
paths” (i.e. paths that that can not actually be sensitised) can not be discarded, and so the generated propagation delays and slacks tend to be pessimistic.
Pessimist ic slacks (i.e. too small) are safe, however. As
speed is an important issue for a system timing analyser
to be used in an analysis-redesign
loop, and as we want
to provide timing analysis for systems at arbitrary levels
of abstraction
(not just at the level of the most primitive logic gates) we decided to use the straight block
analysis method for our slack computations.
is defined as a maximal connected netA cluster
work of combinational
logic elements. All inputs to a
cluster are synchronising
element outputs and all outputs from a cluster are synchronising
element inputs.
Synchronising
element offsets specify assertion times at
cluster inputs and closure times at cluster outputs. We
start by calling the cluster input assertion times node
ready
times
and calculate ready times for all of remaining cluster nodes using equation 1. The assumption that there are no directed cycles within any portions of combinational
logic guarantees that all of the
ready times can be calculated in this way. We calculate
a slack value at each cluster output (synchronising
element input) as the difference between the closure time
and the ready time. Slacks for the remaining nodes in
the cluster are then calculated using equation I!. At
each circuit node the data required time is given by the
node ready time plus the node slack.
(1)
(2)
Ready-time and slack evaluation formulae. I - component inputs, Z - component outputs, R - ready time, S
1
- slack, P - input to output propagation delay.
For a cluster within a system incorporating
assorted
types of synchronising
elements, and controlled by different clocks, the cluster input assertion times and output closure times are defined as offsets with respect to
various different reference times (the ideal assertion and
closure times). In order to perform the above cluster
analysis, it is necessary for all of these times to be known
with reference to the same point in time.
We can visualise the process of converting all of the
values to offsets from the same reference point as follows: First it is necessary to “break openn the clock
period. This will produce an interval of time, of one
overall period in length, in which the locations of all of
the ideal assertion and closure times (i.e. clock edges
are well defined and into which we can easily place a l)1
of the input assertion times and output closure times.
We then choose any point for use as a common reference time against which to state all of the assertion and
closure times.
The problem lies in deciding where to break open the
clock cycle. A bad choice results in path always having
an input assertion time after its output closure time. To
decide where to break open the clock cycle it is necessary
to look at the assumption of intended behaviour and the
relationships
that this imposes between pairs of clock
Paper 39.2
660
Figure 4: Directed Graph Example.
a) Clock waveforms; b) Directed graph representing clock; c) Cluster annotated with ideal input assertion times and ideal
output closure times; d) Directed graph completed for
this cluster.
edges. From the assumption of intended behaviour we
know that: i) control paths have ideal path constraints
of exactly zero; ii) all other paths have ideal path delays
that are strictly positive and equal to at Imost one overall
clock period.
It is not always possible to break open the clock cycle
so that the resulting times between ideal input assertion
time and ideal output closure time is non-negative for
all paths through the cluster.
Figure 1 gave a simple
case in which two cluster analysis passes are required.
As a pre-processing stage, we decide how many analysis passes are needed for each cluster, where to break
open the clock period for each pass and for each cluster
output, which analysis applies. During an analysis that
does not apply to a specific output we set the node slack
to a large number before performing the slack calculations. Having completed all of the necessary passes, the
smallest slack seen at each node is the required node
slack.
We next present, an algorithm
for selecting a set,
of smallest nossible size. of nasses. necc?ssarv to find all
slacks. A dcrected graph is lonstructed
to represent the
sequence of occurrence of the clock edges. Parts (a) and
(b) of Figure 4 show a set of clock waveforms and the
corresponding
graph. Each of the ways in which it is
possible to break open the clock perio,d is represented
by the removal of a single arc. Next we consider all cluster input-output
combmations between which switching
paths exist (other than control paths) and represent the
clock edge ordering required (for the ideal path constraints to be strictly positive) by adding extra arcs to
the graph. Figure 4(c) shows an example of a small cluster annotated with the names of the clock edges that are
the ideal assertion and closure times. Figure 4(d) gives
the completed directed graph for this cluster.
A broken open clock period that satisfies the requirement
represented by an extra arc is one that is
represented by the removal of an orig’inal arc that appears after the head, and before the tail, of the extra
arc. For example, the requirement
that edge E occur
before edge C is satisfied by the broken open period
represented by removal of the original1 arc from node
D to node E. The clock edges then occur in the order E - F - G - N - A - B - C - 43, in which edge
E is before edge C. The minimum sized set of analysis passes required is represented by the minimum sized
set of arcs that has to be removed from the underlyin
clock graph so that one member lies between the he a3
Ugorithm
3 (Analysis-Redesign
iynthesise initial area optimised
nodules.
Until all paths are fast enough:
Loop)
combinational
logic
Perform timing analysis to identify all paths that
are too slow;
Provide input data ready times and output required times for all combinational
logic modules
traversed by paths that are too slow;
Select one such module and speed up slow paths.
and tail of every added arc. We find such a set by exhaustive search of the graph, staring with all removal of
each single original arc, then we try all possible pairs,
and so on until the above condition is satisfied. The
graphs are usually small and very seldom is it necessary
to remove more than two arcs.
For each cluster output we find the broken open
clock period within which its ideal closure time appears
closest to the end. The output node slack is calculated
during the corresponding cluster analysis pass.
8
Implementation
and Results
The algorithms described here have been implemented
in the computer program “Hummingbird”,
written using the ‘C‘ programming langua e, and which interfaces
with other programs in the Ber P;eley Synthesis System
via the OCT data base.
All experiments so far have been with networks of
standard cells. Propagation delays for the standard cells
have been estimated using delay evaluation expressions
that take into account the connected loads. For combinational logic modules the delays have been combined
to generate estimates of the module propagation delays.
Hummingbird
has an interactive mode in which, for
example, changes may be made to the shapes of the
clock waveforms to determine the effect on system timing. Adjustments
may also be made to component delays, One option that users have is to flag all slow paths
in the OCT data base. If the design has been placed
and routed, the slow paths may then be viewed during
a VEM graphical editing session.
Algorithm 3 shows how we propose to automate the
analysls/re-design
process. Singh et al. [l] have shown
how to choose the combinational
logic module that has
most potential for speed up to meet timing constraints,
and also how to achieve the speed up.
Table I shows the run times for a number of examples. DES is a complete data encryption chip, made up
from 3681 standard cells. ALU is a portion of a CPU
chip made up from 899 standard cells. SMlF a 12 bit
finite state machine described as a *flattened” network
of standard cells. SMlH is a “hierarchical”
description
of the same machine in which the combinational
logic is
contained in a single module. Pre-processing times include the times taken for generating combinational logic
clusters and for performing the algorithm described in
Section 7. The analysis times are the times taken to
perform Algorithm
1. Data input and output times are
not shown. We point out that the number of iterations
required, and hence the run times, depend upon the
specified clock speeds.
9
Conclusion
We have described the need for system timin
analysis
in a logic synthesis environment and have note li the need
to correctly model static CMOS logic synchronised by
complicated multi-frequency
clocking schemes. A new
urn. Nets
89
304
11
Table 1: Run times in VAX 8800 cpu seconds.
systematic approach to system timing analysis has been
proposed and fast new algorithms have been presented.
The algorithms identify all path that are “too slow”
and provide timin
constraints for use by a combina
tional logic re-synt a esis program. A new feature is that
the minimum number of voltage settling times are calculated for nodes of combinational
networks with input
transitions controlled by different clock signals. The algorithms presented have been implemented in the computer program Hummingbird.
Run-time statistics have
been provided, and indicate that the method is, indeed,
very fast.
Acknowledgement
Early discussions with Professor Robert Brayton were influential
in determining the
direction this research. The authors also wish to acknowledge many useful conversations with Kanwar Jit
Singh and with Gary Gannot, of Intel. This research
has been funded from D.A.R.P.A.
grant, N00039-87-C0182, by the MICRO program of the State of California,
Hughes Aircraft, Intel and Rockwell.
References
[I] K. J. Singh, A. R. Wang, R. K. Brayton, and A.
Saneiovanni-Vincentelli.
Timinn ootimization
of
com1inational
logic. In Interna‘iioial
Conference
On Computer-Aided
Design, IEEE, 1988.
[2] J. K. Ousterhout.
A switch-level timing verifier for
on Computerdigital mos vlsi. IEEE Transactions
Aided Design, CAD-4, No.3, July 1984.
[3] S. H. Hwang, Y. H. Kim, and A. R. Newton. An
accurate delay modeling technique for switch-level
timing verification.
In .2&d Design Automation
Conference, ACM IEEE, 1986.
synthesis in
[4] G. De Micheli.
P er f ormance-oriented
the yorktown silicon compiler.
In bternational
Conference
On Computer-Aided
Design,
IEEE,
1986.
[5] T. M. McWilliams.
Verification
of timing constraints on large digital systems. In 17th Design
Automation
Conference,
ACM IEEE, 1980.
[6] R. B. Hitchcock, Sr. Timing verification
and the
timing analysis program. In 19th Design Automation Conference,
ACM IEEE, 1982.
[7] L. C. Bening, A. L. Alexander, and J. E. Smith.
Develonments in logic network nath delav analv‘ConjeFenEk, ACb
sis. In-19th Design 2utomation
IEEE, 1982.
[8] D. E. Wallace and C. H. Sequin. Atv: an abstract
timing verifier. In .Zth Design Automation
ConfeTence, ACM IEEE, 1988.
[9] T. G. Szymanski.
Leadout: a static timin
analyzer for mos circuits. In International
Con Bewnce
On Computer-Aided
Design, IEEE, 1986.
[lo] N. P. Jouppi. Timing analysis for nmos vlsi. In
20th Design Automation
Conference, ACM IEEE,
1983.
Paper 39.2
661