The Hierarchical Timing Pair Model: Conference Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221377182

The hierarchical timing pair model

Conference Paper · May 2001


DOI: 10.1109/ISCAS.2001.922061 · Source: DBLP

CITATIONS READS

2 22

3 authors:

Nitin Chandrachoodan Shuvra S. Bhattacharyya


Indian Institute of Technology Madras University of Maryland, College Park
57 PUBLICATIONS   175 CITATIONS    439 PUBLICATIONS   5,881 CITATIONS   

SEE PROFILE SEE PROFILE

K. J. Ray Liu
University of Maryland, College Park
796 PUBLICATIONS   24,721 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Industrial Wireless Systems View project

WiFi-based Breathing Monitoring View project

All content following this page was uploaded by Shuvra S. Bhattacharyya on 22 May 2014.

The user has requested enhancement of the downloaded file.


THE HIERARCHICAL TIMING PAIR MODEL

Nitiiz Chaizdrachoodaiz,Shuvm S. Bhattacharyyabzd K. J. Ray Liu

Department of Electrical and Computer Engineering,


University of Maryland, College Park, MD 20742
(nitin,ssb,kjrliu @eng.umd.edu)

ABSTRACT
We present a new model for representing timing information
for functions in High-Level Synthesis (HLS). We identify short-
comings of the conventional timing model, which is a very simple
model derived from the combinational logic model, and show that
our new model overcomes many of these defects. In particular, we
are able to provide a unified timing model that describes hierar-
chical combinational and iterative circuits and provides a compact
representation of the information, that can be used to streamline Figure 1: (a) Full adder circuit. (b)Hierarchical block view.
system performance analysis.
We present experimental results that demonstrate the effective-
ness of our new approach, and describe an efficient algorithm to
represent the same information as the expanded graph that has 5
easily compute the required timing parameters from a description
vertices and 14 edges. In large systems, the savings offered by us-
of the graph. ing hierarchical representations are essential to retaining tractabil-
ity. A hierarchical representation would also be very useful in
1. INTRODUCTION commonly used sequential circuits such as digital filter implemen-
tations.
High-Level Synthesis (HLS) refers to the task of constructing an A major disadvantage of the conventional approach is that
architecture, binding and schedule for an algorithm that has been it does not allow a hierarchical description of the system timing
described at a high level of abstraction. The algorithm is usually when the system.contains delay elements (iterative systems) such
represented as a dataflow graph whose vertices represent functions as the digital filter mentioned above. These delay elements roughly
and edges represent communication or dependencies. To map such correspond to registers in a hardware implementation, but are more
a dataflow graph onto an architecture (either hardware or software) flexible in that they do not impose the restriction that all the de-
efficiently, we need to annotate the application specification and lay elements are activated at the same instant of time [ I . 9 . 31.
architecture with infomiation about the execution times of vertices, This allowance for variable phase clocking is an important way in
and the area utilization and power consumption of processing re- which HLS differs from combinational logic implementation. The
sources. The timing information is used to generate a set of con- r-ephnsirzg optimization in [ I ] provides a good example of how this
straints related to the system that the actual implementation must can be used. Even in sequential logic synthesis. variable phase
satisfy. clocking has been considered in such forms as clock skew opti-
The conventional model for describing timing in this context mization [4] and shimming delays [SI.
is derived from the method used in combinational logic analysis. To the best of our knowledge. there does not appear to be any
Here each vertex is assigned a single value (called the “propaga- other timing model that addresses this issue. Using conventional
tion delay”) representing the maximum delay among all its input- models, a complicated subsystem containing sequential elements
output pairs. will need to be represented in full in the context of the overall
An important requirement of a timing description is the ability system design. rather than using a more convenient condensed de-
to represent systems hierarchically. For example, Fig. I shows the scription of the timing parameters alone.
circuit of a full adder. If we were to consider this as part of a larger In this paper, we propose a different timing model that over-
system (say a 4-bit adder made of 4 full adders), we would pre- comes these difficulties. By introducing a slightly more complex
fer to use the timing information for the hierarchical block view. data structure that allows for multiple input-output paths with dif-
rather than the expanded gate-level view. The reason for this is that fering numbers of delay elements, we are able to provide a single
algorithms for path length computations are typically O(11,71[ E l ) timing model that can describe both purely combinational and it-
where Il/7Jis the number of vertices and [Elis the number of edges erative systems. For purely combinational systems. the model re-
in the graph. The hierarchical view uses 1 vertex and S edges to duces with minimal overhead to the existing combinational logic
This research was suppoited in p a t by the US National Science Foun- timing model. Further details are also available in 161.
dation Grant #9734275 and NSF NYI Award MIP9457397 Our model provides compact representations of the timing data
tAlso with the University of Mruyland Institute for Advanced Com- for large systems. We have used the ISCAS 89/93 benchmark cir-
puter Studies. cuits to test our ideas and have obtained promising results.

V-367
0-7803-6685-910 1 /$10.000200 1 IEEE
R
In the next section, we discuss the requirements that a timing
model must meet. and examine some of the shortcomings of the
conventional model. Section 3 then presents a new model that
overcomes these defects, and explains how it can be efficiently
stored and manipulated. Section 4 presents the results obtained
by apqlying the new technique to benchmark circuits. Finally, we
present our conclusions and some interesting directions for future
research. Figure 2 : Timing of complex blocks

2. REQUIREMENTS OF A TIMING MODEL imply between its input and output. To clarify this idea, consider
the block in Fig. 2 . If we were to write the constraints in terms of
In order to understand the requirements of timing descriptions for
the internal blocks zi and zo, we would obtain
hierarchical systems, it is first useful to clarify certain assumptions
that are made in describing simple combinational systems. xi - 2 1 2 t 1 ; z o-xi 2 t i - 1 x T ; x- ~ > to
x0 -
First, the combinational delay of a system is the rizn.vin7urn de-
lay between any input/output pair in the system. So after the inputs Now we would like to compute certain information such that
are stable, we can wait for the amount of time specified by this de- if we were to combine the complex blocli‘ B under the single start
lay, and be sure that the output is stable. In some cases, especially time 26. we would still be able to write down equations that would
in HLS. the time is in terms of integer multiples of a system clock. provide the same constraints to the environment outside the block
For multiple-input-multiple-output (MIMO) systems, we as- L?. We see that this is achieved by the following constraints:
sume that the inputs are synchronized so that the overall system
can be treated as single-input-single-output (SISO). This is a com- 21,- . T i 2 tl I 22 - 26 2 t ; -k to - 1 X T
mon assumption in combinational timing models [5, 71. To see
why. consider for example the full adder circuit from fig. I. Since In other words, if we assume that the execution time of the block
the output depends on all the inputs, it is acceptable to assume that +
B is given by the expression ti to - 1 x T , we can put down
the computation start only after all inputs are available, thus syn- constraints that exactly simulate the effect of the complex block
chronizing the inputs. It is clear that this assumption breaks down B.
when different outputs do not depend on all inputs, but in most In general, consider a path from input vi = VI to output vo =
cases. this is considered an acceptable tradeoff as it reduces the vk through vertices {VI . . . v k } given by p : v1 +v2+ . . . +vk,
~ ~

complexity of the analysis. with edges ei : ?J~+v~+I.Let t i be the execution time (propaga-
For dataflow graphs used in HLS. we use essentially the same tion delay assuming it is a simple combinational block) of vi. and
combinational timing model that is described above. Delay el- let d, be the number of delays on edge eJ. Now we can define the
ements, however. are treated differently [ I . 2. 31. In sequential corisrrairit time of this path as
logic circuits. all delays are treated as pip-pops that are triggered
on a common clock edge. In HLS scheduling. we assume no such k k-1
restriction on the timing of delays. We assume that each functional tc(p)=):ti-Txxdd,
unit can be started at any time (possibly by providing a start sig- i=l j=1
nal).
Now we can see what exactly are the uses of a timing model. We use the term “constraint time” to refer to this quantity be-
The timing information associated with a block is used primarily cause it is in some sense very similar to the notion of the execution
for the purpose of establishing constraints on the earliest time that time of the entire path, but at the same time is relevant only within
the successor elements of the block can start operating (i.e. when the context of the constraint system it is used to build. Also, we
its outputs become stable once the inputs are applied). By using
these constraints. additional metrics can be obtained relating to the
use the term cIlto refer to the sum x!=l
t ; ,and inp to refer to the
sum dJ . The ordered pair (mplc p )is referred to as a rirning
throughput and latency of the system, such as the iterntior?period pflir.
Doiirid. which is the same as the riin.vimitni cvclc rnenr? [SI for single We therefore see that by obtaining the pair ( r n , , c p ) (in the
rate graphs. The constraints are used for determining the feasibility
of different schedules of the system, where a schedule consists
example of Fig. 2 . cp = t i +
t o and T T L ~= l), we can derive
the constraints for the system without needing to know the internal
of an ordering of the vertices on resources that can provide the construction of B.
required functionality. We can understand the constraint time as follows: if we have
a SISO system with an input data stream z ( n ) and an output data
3. THE HIERARCHICAL TIMING PAIR MODEL stream :r/( n ) = 0.5 x z( n - l),the constraint time through the sys-
tem is the time difference between the arrival of z(0) on the inpul.
Having identified the requirements of a timing model and the short- edge and the appearance of y(0) on the corresponding output edge.
comings of the existing model. we can now use Fig. 2 to illustrate This is very similar to the definition of pnirwise latencies in [I]. 11:
the ideas behind the new model for timing. In this figure, we use is obvious that y(0) can appear on its edge before x ( 0 ) ,since ?/(0)
t - to refer to the propagation delay of a block. and 2- to refer to depends only on z(-1) which (if we assume that the periodicity
the stcirt rime of the block. T is the iteration interval (clock period of the data extends backwards as well as forwards) would have
for the delay elements). appeared exactly T before z(0). So the constraint time through
To provide timing information for a complex block, we should this system is t , - T , where t , is the propagation delay of the!
be able to emulate the timing characteristics that this block would unit doing the multiplication by 0.5 and T is the iteration period of

V-368
Table 1 : Tests for dominance of a path.

Figure 3: Second order filter section [3]. Algorithm 1 relax-edqe


Input: edge e : w, -+ v in graph G; t(u.) is the execution time
of source vertex U , d ( e ) is the number of delays on edge e ;
the data on the system. This number can be negative, and in fact, Zz.qt(u),Eist(v) are timing pair lists.
depends on the value chosen for T . Output: Use the conditions from Table 1 to modify Eist(v) using
This description of timing pairs makes it clear that the actual elements of &st(w.). Return TRUE if a modification was made,
constraint time of a path through the graph depends on the itera- else return FALSE
tion interval T . In particular, for different values of T ,it is possible 1 : RELAXED +-FALSE
that a different path through the circuit results in the largest con- 2: for all timing pairs t , from Zist(u)do
straint time. In other words, the longest path through the graph 3: t,+-t, + (t(w,),d(e))
now depends on T . As a result, we need to efficiently compute 4: if t , dominates an element of Eist(v) then
and store enough information about all input-output paths so that 5: insert t,, adjust Eist(v)
we can easily find the actual value of the largest constraint time 6: RELAXED +TRUE
between the input and output. 7: end if
An example of this is seen in Fig. 3, which shows a second- 8: end for
order filter section [3]. Here 4 and p.2 are distinct 1-0 paths. Let 9: retum RELAXED
the execution time for all multipliers be 2 time units and for adders
be I time unit. except for A..;which is 2 time units. In this case,
for an iteration period ( T )between 3 and 4, p.2 is the dominant
path, while for T > 4, PI is the dominant path. So we now need r n p l 2 m p 2 .The minimum iteration interval allowed on the sys-
to store both these ( 7 n p . c p )values. tem is denoted TO.This would normally be the iteration period
We therefore end up with a lisr of timing pairs that model the bound of the circuit, but may be set to a higher positive value for
timing of the circuit. The actual constraint time of the overall sys- design safety margins.
tem can then be readily computed by traversing this list to find The conditions from Table 1 can be used to find which timing
the maximum path constraint time. The size of the list is bounded pairs are necessary for a system and which can be safely ignored.
above by the number of delays in the system ((DI). For the example of Fig. 3, PI has the timing pair (0;3) while Pz
We now have a model where the rirniiig pairs that we defined has (1:7) with timing as assumed in section 3. Thus from con-
above can be used to compute a corwrruirzr ririze on a system, which dition 2 above, P2 will dominate for 3 < T < 4, and PI will
can be used in place of the execution time of the system in any cal- dominate for T 2 4.
culations. This model is now capable of handling both combina- The algorithm we use to compute the timing pairs is based
tional and iterative systems, and can capture the hierarchical nature
on the Bellman-Ford algorithm for shortest paths in a graph. We
of these systems easily. We therefore refer to it as the Hier-orcllical
have adapted it to compute the longest path information we re-
Tirniri~Prrir ( H T P )Model. quire. while simultaneously maintaining information about mul-
This definition of constraint time also results in a simple method
tiple paths through the circuit corresponding to different register
for determining the iteration period or maximum cycle mean of the counts.
graph. Lawler's method [9] combined with the adaptive negative
cycle detection techniques from [SI provides an efficient method Algorithm 1 implements the edge relclsnrion step of the Bellman-
of computing the maximum cycle mean of the system, since it op- Ford algorithm [IO, p.520]. However, since there are now multiple
erates by fixing T and testing the system for consistency, using paths (with different delay counts) to keep track of, the algorithm
a binary search to iteratively improve the estimated value of T . handles this by iterating through the timing pair lists that are be-
Because the constraint time of each path depends on the iteration ing constructed for each vertex. An important point to note here is
period which is as yet unknown. it is not obvious how other algo- that the constraint time around a cycle is always negative for feasi-
rithms for the MCM can be extended to this model. ble values of T , so the relax-edge algorithm will not send the
timing pair computations into an endless loop.
3.1. Data structure and Algorithms Using algorithm 1. the overall timing pairs are easily com-
puted using the Bellman-Ford algorithm [ 10, p.5311. The com-
We now present an efficient algorithm to compute the list of timing plexity of the overall algorithm is O ( ~ D ~ ~ where l ~ r ~(DI
~ isEthe
~)
pairs associated with a system. number of delay elements in the graph (therefore a bound on the
Consider n system where there are two distinct 1-0 paths PI length of a timing pair list of a vertex), 11;*1 is the number of ver-
and P,. with corresponding timing pairs ( r p ,. inp, ) and ( c ; ~ > ~~. i i ~ ? ~ )tices.
. and (El is the number of edges in the graph. Note that (Dl is
Table I showy how the two paths can be treated based on their tim- quite a pessimistic estimate, since it is very rare for all the delays
ing pair values. We have assumed without loss of generality that in a circuit to be on any single dominant path from input to output.

V-369
#timingpairs I1 2 3 4 5 timing pairs than systems where the delay elements are restricted
# circuits 121 13 5 4 1 to a relatively small m o u n t of feedback.

Table 2: Number of dominant timing pairs computed for ISCAS


5. CONCLUSIONS AND FUTURE DIRECTIONS
benchmark circuits.
We have presented the Hierarchical Timing Pair model, and asso-
ciated data structures and algorithms to provide timing infomation
4. RESULTS
for use in the analysis and scheduling of dataflow graphs. We have
As was mentioned in the introduction, the main benefit of the shown that the HTP model overcomes many limitations of the con-
model we propose is in the ability to hierarchically model sys- ventional timing models, while incurring a negligible increase in
tems. Also, the model allows us to represent all relevant input- complexity.
output paths using a list of timing pairs as described in sec. 3.1. Using the examples of the ISCAS and HLS benchmarks. we
So a suitable measure of the merit of the system would be to see have demonstrated the power of our approach, and have shown that
the size (number of elements) of the list required to represent the if we can accept the assumption of synchronizing nodes, we can
timing behavior of a graph. obtain a reduction by several orders of magnitude in the amount of
We have run the algorithm described in section 3.1 on the IS- information about the circuit that we need to store in order to use
CAS 89/93 benchmarks. A total of 44 benchmark graphs were its timing information in the context of a larger system.
considered. For this set, the average number of vertices is 3649.86, It appears that the HTP model can be efficiently extended to
and the average number of output vertices in these circuits is 39.36. also include multi-rate systems. With certain simple assumptions
First we consider the case where synchronizing nodes were on the regularity of behavior of such systems, they can be analyzed
used to convert the circuit into an SISO system. We are interested in the same framework as single rate systems. We are currently
in the number of elements that the final timing list contains. since working on extending the HTP model to such multirate systems.
this is the amount of information that needs to be stored. Table 2
shows the breakup of the number of list elements. We find that the 6. REFERENCES
average number of list elements is 1.89.
Next, instead of assuming complete synchronization, we con- M. Potkonjak and M. Srivastava, “Behavioral optimization
sidered the case where inputs are synchronized, and measured the using the manipulation of timing constraints,” IEEE Trans-
number of list elements at each output. The number of distinct actioris on Coinpiiter Aided Design, vol. 17. pp. 936-947.
values obtained for this was an average of 14.73. If we make an Oct 1998.
additional assumption that if two list elements have the same nip P. G. Paulin and J. P. Knight. “Force-directed scheduling for
they are the same, this number drops to 3.68. This assumption the behavioral synthesis of ASIC‘s,” IEEE Trarisacrions on
makes sense when we consider that several outputs in a circuit Conputer Aided Design, vol. 8, pp. 661-679, Jun 1989.
pass through essentially the same path structures and delays, but
S. M. H. de Groot. S. H. Gerez. and 0. E. Henmann,
may have one or two additional gates in their path that creates a
“Range-chart-guided iterative data-flow graph scheduling,”
slight and usually ignore-able difference in the path length. For IEEE Trtrrisactioris oii Circuits and S.vsteni.~- I. vol. 39.
example, the circuit s386 has 6 outputs. When we compute the
pp. 35 1-364, May 1992.
timing pairs, we find that 3 have an element with I delay, and the
corresponding pairs are (1 53)! ( 1! 53). ( 1! 57). Thus instead of 3
~
J. P. Fishburn. “Clock skew optimization,” I€€€ Tmisac-
pairs, it seems reasonable to combine the outputs into 1 with the tioris 011 C O I I I ~ U ~ E I39,
vol. S , pp. 945-95 1, Jul 1990.
timing pair (1: 5 7 ) corresponding to the longest path. H. V. Jagadish and T. Kailath. “Obtaining schedules for
In order to compare these results, note that if we did not use digital systems,” IEEE Tvaiisctcrior1.s on Signa/ Processing,
this condensed information structure, we would need to include vol. 39, pp. 2296-2316. Oct 1991.
information about each vertex in the graph. In other words, if we N. Chandrachoodan, S. S. Bhattacharyya, and K. J. R.
accept the (in most cases justifiable) penalty for synchronizing in- Liu, “The hierarchical timing pair model for synchronous
puts and outputs, we need to store an average of 1.89 terms instead dataflow graphs,” Tech. Rep. UMIACS-TR-2000-75. Univer-
of 3649.86. sity of Maryland Institute for Advanced Computer Studies.
We have not considered the case of relaxing the assumptions Nov 2000. hrtp://dspser~reng.
unid.edii/l,ub/dsycnd/~apers/.
on the inputs as well. This would obviously increase the amount
of data to be stored, but as we have argued, our assumption of N. Kobayashi and S. Malik, “Delay abstraction in combina-
tion logic circuits,” IEEE Trtrrisactioris on Cornpiiter. Aided
synchronized inputs and outputs has a very strong case in its favor.
We have also computed the timing parameters for HLS bench- Design. vol. 16, pp. 1205-1212, Oct 1997.
marks such as the elliptic filter and 16-point FIR filter from [3]. N. Chandrachoodan, S. S. Bhattacharyya. and K. J. R.
These are naturally SISO systems which makes the synchroniz- Liu, “Negative cycle detection in dynamic graphs,“ Tech.
ing assumptions unnecessary. If we allow the execution times of Rep. UMIACS-TR-99-59, University of Maryland Insti-
adders and multipliers to vary randomly, we find that the FIR filter tute for Advanced Computer Studies, September 1999.
has a number of different paths which can dominate at different erig.U I ~ C Ie. d l r / p i r b / d . ~ i ~ c a ~ ~ a ~ e f , . ~ / .
littp://d.spser~
times. The elliptic filter tends to have a single dominant path, but E. Lawler. Cornbiiintorial0ptimi:ation: Networks niid Ma-
even this information is useful since it can still be used to repre- troids. New York: Holt, Rhinehart and Winston, 1976.
sent the filter as a single block. In general, systems which have T. H. Cormen, C. E. Leiserson. and R. L. Rivest. Iritrodircriori
delay elements in the feed-forward section, such as FIR filters and to Algor-irlms. Cambridge. MA: MIT Press, 1990.
filters with both forward and backward delays. tend to have more

V-370

View publication stats

You might also like