An Efficient Clock Tree Synthesis Method in Physical Design: December 2009
An Efficient Clock Tree Synthesis Method in Physical Design: December 2009
An Efficient Clock Tree Synthesis Method in Physical Design: December 2009
net/publication/251916106
CITATIONS READS
4 2,952
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Yuan Wang on 26 January 2016.
Abstract — This paper proposes a method aiding in low have bigger fan-out, and have to be distributed over more
clock skew applicable to the mainstream industry clock function modules. It has been observed that the existing
tree synthesis (CTS) design flow. The original clock root APR tool performance has deteriorated because of its
is partitioned into several pseudo clock sources at the gate limited computation capability. To address this issue, the
level. The automatic place and route (APR) tool may logic solution is to break the clock nets into smaller parts.
synthesize the clock tree with better performance in clock In paper [4], this is accomplished by partitioning the chip
skew because each pseudo clock source drives smaller into several pseudo-partitions at the layout level, based on
number of fan out. The proposed method is applied to a the cells placement. However, a new Visual Basic based
chip level clock tree network and achieves good results. routing tool based on the Exact Zero Skew (EZS) routing
Keywords: Physical Design, Clock Tree Synthesis, Low algorithm [1] should be developed to support the
Clock Skew methodology implementation which is so time-consuming.
Furthermore, other issues discussed in its conclusion
I. INTRODUCTION section restrict the use of this method also. As for
partitioning at the RTL level, it involves some changes to
Clock network design has been a key aspect of the the chip architecture and would increases the complexity
design process which directly impacts the performance of of the solution. Except for the above two methods,
the chip. The following equation [1] summarizes the partitioning at the gate level has not been reported hereto
relationship of the clock period P , clock skew s , worst and it will become our object of study.
case data path delay d max , and other offset constant In this paper, we partition the original clock root into
Po for the proper timing. several pseudo clock sources at the gate level, which
needs no changes to the chip architecture and no extra
routing tool developed. The method is applicable to the
P = s + d max + Po (1) mainstream industry CTS design flow that ensures the
quality and efficiency. The outline of this article is as
The clock skew is the maximum different among the follows. We first review the common CTS modes
clock latencies from the clock source to flip-flops. Skew supported by the mainstream industry APR tools, the
can be calculated at the edge of the clock root in three Cadence First Encounter (v03.30) being used as the
fashions: rise skew, fall skew, and trigger-edge skew [2]. platform. Next we conduct a series of experiments and
In this paper, we calculate the skew in trigger-edge derive the clock tree partition guidance as the result of
fashion. Po is a constant that includes data set up, hold these experiments. Then we apply the method to a chip
time, latch active time, and other possible offset factors level clock tree synthesis of an embedded processor and
like safety margins. It is clear from the equation that to compare the experimental results between the proposed
reduce the cycle time P it is necessary to minimize the method and conventional method. Finally, we make some
skew s , besides the minimization of the worst case data discussion and draw the conclusion.
delay d max on the combinational logics.
II. REVIEW FOR COMMON CTS MODES
As interconnection delay is becoming more dominating
in deep submicron (DSM) silicon technology levels, the
There are two modes for running CTS in Cadence First
clock skew is more significant in terms of circuit
Encounter APR tool: manual and automatic [2].
performance. Therefore the minimization of skew is
always a very important topic in the design of Manual CTS mode allows you to control the number of
synchronous sequential circuit [3]. levels and the number of buffers, and specify the types of
With the growing complexity of system designs, clock buffers at each level. The following is an example of
network are getting increasingly complex. Clock nets clock-tree specification file syntax and a graphic
representation of that syntax as seen in Fig. 1:
ClockNetName MCLK_GE
Guirong Wu, Song Jia*, Yuan Wang and Ganggang Zhang are
with the Key Laboratory of Microelectronics Devices and LevelNumber 2
Circuits, Institute of Microelectronics, Peking University, LevelSpec 1 2 CLKBUFX20
Beijing, P. R. China. LevelSpec 2 10 CLKBUFX16
E-mail: [email protected], [email protected], PostOpt YES
[email protected], [email protected] End
ClkGroup
+ SH1/I3/Z1
Fig. 1. Graphic representation of manual CTS mode + SH2/I4/Z2
TABLE II
STATISTICS OF THE CTS RESULT
REFERENCES
[1] Tsay, R.-S. “Exact zero skew,” Computer-Aided
Design, 1991. ICCAD-91. Digest of Technical
papers., 1991IEEE International Conference, pp.
Fig. 6. Graphic representation of the partition scheme 336-339, 1991.
[2] Encounter User Guide, Product Version 5.2.1,
Finally, we use First Encounter to synthesize the February 2006.
pseudo clock trees in automatic CTS mode, named as new [3] Chia-Ming Chang, Shih-Hsu Huang, Yuan-Kai Ho,
in case item in the table. For comparison between the Jia-Zong Lin, Hsin_Po Wang, Yu-Sheng Lu, “Type-
proposed method and the conventional method, we also matching clock tree for zero skew clock gating,”
conduct CTS experiment for the original clock root, Design Automation Conference, 2008. DAC 2008.
named as original in case item. The summary of the 45th ACM/IEEE, pp. 714-719, 2008.
comparison is shown in Table III. It shows that the [4] Reaz, M.B.I., Amin, N., Ibrahimy, M.I., Mohd-Yasin,
method proposed in this article improves 66.3% in F., Mohammad, A., “Zero skew clock routing for fast
trigger-edge skew, increases 5.88% in chip area and 40% clock tree generation,” Electrical
in run time compared with the conventional method. and Computer Engineering, 2008, CCECE 2008.
Canadian Conference on, pp. 4-7, May 2008.
TABLE III [5] Y. P. Chen, D. F. Wong, “An Algorithm for Zero
SUMMARY OF THE COMPARISON Skew Clock Tree Routing with B,” In: Proceedings
of the 42nd annual conference on Design automation,
Clock Tree Time USA, pp. 783-788, 2005.
Case Skew Total Area
Area (normalized)
( ps ) ( μm 2 )
( μm 2 )
original 88.4 16695 452783 1
new 29.8 22473 479367 1.4