Clock Training
Clock Training
Clock Training
1 CTS
ccopt_design
• CTS : ccopt_design
– CTS & datapath optimization
route & post-route opt – Standard effort or extreme effort
– Switch to propagated clock timing & source
latency update
• Useful skew at all steps by default
– pre-cts useful skew or early clock useful skew
• For CTS only without datapath optimization – standard effort or extreme effort CCOpt
clock_design / ccopt_design –cts – post-route useful skew
ccopt_design
6 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
CTS in the Innovus flow – Early clock – How it works
place_opt_design
with
early placement • Clustering and balancing with virtual delays
clock
global opt • CTS timing used to annotated clock latencies for
merge FF ideal clock mode timing analysis
cluster & virtual balance
• Skewing adjusts the latencies
• Latencies are communicated to later CTS via pin
timing
opt insertion delays in the in-memory clock tree spec
power split
opt FF
cong useful
repair skew
Note: Do not use reset_cts_config /
reset_ccopt_config before ccopt_design,
doing so will delete the pin insertion delays.
ccopt_design
7 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
CTS in the Innovus flow – Useful skew controls – Common UI
trunk • Leaf net type applies to any net directly connected to a sink
• Can force the net connected to a sink to be considered trunk
set_db pin:name .cts_routing_trunk_override true
leaf set_ccopt_property trunk_override –pin name true
Insertion delay
• SDC / user command approach
– Specify early or late clock arrival time note
– Early typically negative due to 0 network latency inverted
macro m0
set_clock_latency -1 [get_pins {m0/ck}] sign
ck
• CTS pin insertion delay approach
– Specify the clock insertion delay inside the macro
set_db pin:macro/CK .cts_pin_insertion_delay 1ns
set_ccopt_property insertion_delay 1ns –pin m0/CK
• Library max clock tree path approach • create_clock_spec converts SDC
clock latency to CTS pin insertion
– Library specifies the clock insertion delay inside the macro delay
convert_lib_clock_tree_latencies
• To delay sinks instead of advancing
• Recommendation them, invert the sign
– Specify in SDC or use library max clock tree path • Reporting – .logv or command
– Visible to slack driven placement and non-ECF pre-cts report_pin_insertion_delays
optimization report_ccopt_pin_insertion_delays
14 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Setup – Stop & ignore pins
• Stop & ignore pins set_db pin:name cts_sink_type
– Clock spec creation stops tracing at the pin stop
– Clock spec creation will trace to this pin, even set_ccopt_property sink_type
if SDC clocks do not stop –pin name
• Stop pin
– The pin is considered a sink to be balanced in set_db pin:name cts_sink_type
any skew groups which reach it, even if the pin ignore
is not identifiable as a “clock pin” from the
library model set_ccopt_property sink_type
ignore –pin name
• Ignore pin
– The pin is not balanced
update_skew_group ... –
• Skew group specific ignore pin add_ignore_pins ...
– A pin which is ignored in a skew group, but modify_ccopy_skew_group ... –
may be balanced in other skew groups add_ignore_pins ...
20
DRIVE STRENGTH
15
• Recommendation: Continue to 20
11
specify only LVT cells for clock cells
DRIVE STRENGTH
15
Drive Strength
10
Area
0
0 2 4 6 8 10 12
BUFFER INDEX (SORTED BASED ON DRIVE)
count
freq / (mins) designs
#CPUs
Ref 18.1 200 190 191
• CPU, GPU,
Automotive,
1.4 M 2.3G/16 168 57 Networking
149
150
• 1hour / 1M
50 46
41
34 33
2.3 M 2.3G/16 343 111
instances
0
1.8 M 2.3G/16 145 61 1 2 3 4 5 6 7 8 9 average
2.3 M 3.0G/16 541 162 • Core CTS & services (legalization, routing)
2.2 M 3.0G/16 420 137 • Multi-threading in some steps & reporting
5.6M 2.3G/16 667 182 • Post-CTS opt excluded
19 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts
generated m0 no clock
clock_tree:gck1, tree,
reporting only reporting
skew group only skew
group
f1 f2 f3 f4 green - SDC
blue - clock spec
Note: Skew groups and clock trees often have the same name
23 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts – Auto clock spec – Multi mode example
ck mode0.sdc
create_clock [get_ports {ck}]
create_generated_clock -name gck -divide_by 2
[get_pins {d1/Q}] -source [get_pins {d1/CK}]
set_case_analysis 0 [get_ports {sel_div}]
d1 mode1.sdc
create_clock [get_ports {ck}]
create_generated_clock -name gck -divide_by 2
sel_div [get_pins {d1/Q}] -source [get_pins {d1/CK}]
0 1
set_case_analysis 1 [get_ports {sel_div}]
• Two skew groups with source at ‘ck’ input – one per clock per mode blue – mode0
green – mode1
• One ignored at mux ‘0’ input and other ignored at mux ‘1’ input
• Paths through the mux ‘0’ input are not balanced with paths through the mux ‘1’ input
24 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts – CTS internal flow – Standard effort
library trimming, identify placeable area,
Initialization
validate transition & skew targets, log settings
Construction clustering, legalization, DRV repair
Construction
• Post-conditioning may resize and move clock
cells small distances leaving small opens in
clock nets
ccopt_design
Implementation
• Optimization may modify or add clock cells, or
EGR Post-Conditioning re-size flops, also leaving small opens in clock
nets
Clock Routing • Design routing repairs clock nets first, closing
the opens
NR Post-Conditioning • Post-route optimization includes CTS PRO to
further repair clock DRVs
Post-CTS Optimization
18.1: Many improvements in EGR, EGR
Post-Conditioning, NR Post-Conditioning
route & post-route opt
26 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts – DAG Stats
Clock DAG stats after update timingGraph:
cell counts : b=719, i=2653, icg=6495, nicg=0, l=824, total=10691
cell areas : b=1668.215um^2, i=8067.257um^2, icg=54799.189um^2, nicg=0.000um^2, l=7370.830um^2, total=71905.491um^2
cell capacitance : b=1.024pF, i=15.171pF, icg=15.285pF, nicg=0.000pF, l=3.642pF, total=35.122pF
sink capacitance : count=104926, total=255.000pF, avg=0.002pF, sd=0.001pF, min=0.001pF, max=0.040pF counts, area,
wire capacitance : top=0.000pF, trunk=98.883pF, leaf=162.988pF, total=261.871pF cap, length
wire lengths : top=0.000um, trunk=573904.625um, leaf=776348.715um, total=1350253.340um
Clock DAG net violations after update timingGraph:
Remaining Transition : {count=1, worst=[9.7ps]} avg=9.7ps sd=0.0ps sum=9.7ps drv violations
Fanout : {count=3, worst=[20, 20, 6]} avg=15.333 sd=8.083 sum=46
Capacitance : {count=4, worst=[1.714pF, 1.451pF, 1.166pF, 0.002pF]} avg=1.083pF sd=0.755pF sum=4.333pF
Clock DAG primary half-corner transition distribution after update timingGraph:
transition
Trunk : target=300.0ps count=3849 avg=90.5ps sd=77.4ps min=0.0ps max=300.0ps {3135 <= 180.0ps, 459 <= 240.0ps, 174times
<=
270.0ps, 44 <= 285.0ps, 37 <= 300.0ps}
Leaf : target=300.0ps count=6980 avg=166.2ps sd=67.5ps min=16.7ps max=309.7ps {4178 <= 180.0ps, 1714 <= 240.0ps, 619 <=
270.0ps, 296 <= 285.0ps, 172 <= 300.0ps} {1 <= 315.0ps, 0 <= 330.0ps, 0 <= 360.0ps, 0 <= 450.0ps, 0 > 450.0ps}
Clock DAG library cell distribution after update timingGraph {count}:
Bufs: CTB_F4_SVT: 27 GLITCHGOBBLER_F4_DH_SVT: 2 CTB_F1_SVT: 690
Invs: INV_B16_SVT: 67 INV_B14_SVT: 68 INV_B12_SVT: 137 INV_B10_SVT: 175 INV_B8_SVT: 122 INV_B6_SVT: 186 INV_B5_SVT: 117
CTI_F4_SVT: 123 INV_B4_SVT: 141 INV_B3_SVT: 189 INV_B2_SVT: 258 INV_B1_SVT: 1070
library cell usage
ICGs: ICG_F6_SVT: 81 ICG_F5_SVT: 72 ICG_F4_SVT: 942 ICG_F3_SVT: 385 ICG_F2_SVT: 1792 ICG_F1_SVT: 3223
Logics: CTNAND2_F4_AHP_SVT: 2 CTOR2_F4_AHP_SVT: 67 CTMUX2_F4_SVT: 685 CTEXOR2_F4_SVT: 5 CTENOR2_F4_AHP_DH_SVT: 5
CTAND2_F4_SVT: 58 NOR2IA_F8_4SR_75LL: 1 CTOR2_F4_SVT: 1
Primary reporting skew group after update timingGraph:
skew_group PLLCLK/DFT: primary reporting skew group
Half-corner MAX_DELAY_CORNER:setup.late: insertion delay [min=7146.5, max=7596.1, avg=7428.6, sd=90.2], skew [449.6 vs
400.0*], 99.2% {7182.6, 7582.6} (wid=1060.4 ws=584.8) (gid=7052.3 gs=859.6)
Skew group summary after update timingGraph:
... all skew groups
27 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts – Clock tree debugger (CTD)
Insertion delay Unit delay • Open in unit delay view
to avoid invoking RC
extraction or delay
calculation, e.g. on un-
placed design, or at
post-route
gui_open_ctd
-unit_delay
ctd_win
-unit_delay
• How to find clock tree instances which are fully don’t touch by user
– get_db clock_trees .insts -expr {"user true" in $obj(.dont_touch_sources)}’
– See also report_preserves to report on the whole design
• Some existing users are already using “don’t touch means don’t size”
– Setting was “set_ccopt_property allow_resize_of_dont_touch_cells false”.
– This property, and the corresponding private CUI attribute, will be phased out in 18.2/19.1
30 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Concepts – Get CTS right before optimizing timing
✓ Run check_design • Cluster run
✓ Check warnings and errors in the log set_db cts_balance_mode cluster
set_ccopt_property
✓ Check maximum insertion delay cts_balance_mode cluster
– If too large consider cluster run, and check the maximum
insertion delay path in the CTD & layout
• Disable detail routing to save time
✓ Check no unbuffered nets and DRV violations set_db cts_route_clock_tree_nets
– Check warnings in the log, color options in the CTD false
– Check in CTD and layout view set_ccopt_property
– Check MSV warnings & setup use_estimated_routes_during_fina
l_implementation true
– Use cluster run to debug if needed
✓ Get CTS right before optimizing timing
– No optimization – clock_design / copt_design –cts • Measuring cost of balancing
– Check skew group insertion delay & skew occupancy – Clustering depends on the skew target.
– Check report_clock_trees/report_skew_groups Cluster run skips some optimizations.
– Check DAG stats at CTS internal flow steps – Recommend to increase the skew
target rather than use cluster run.
– Look at the clock tree debugger
31 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
18.1
Concepts – Max clock tree path – Overview
• Cell library optional max_clock_tree_path attribute
– Specifies cell internal clock tree delay, index by input transition
– Is NOT a timing arc, not included in any timing analysis/report
• Timing significance
– Missing early offsets for memories a common cause of bad
Insertion delay
reg2mem hold TNS
– Optimistic/pessimistic setup timing for reg2mem/mem2reg
• Historical CTS support
– Off by default, behaves like an additional pin insertion delay
– Too late in the flow – no visibility at placement
• New command convert_lib_clock_tree_latencies
– Convert MCTP data to per pin clock latencies macro
– Aware that SDC pin latencies override SDC clock latencies max_clock_tree_path
– Updates in-memory timing constraints or export SDC text
32 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
18.1
Concepts – Max clock tree path – Flow
converts .lib max_clock_tree_path to
set_clock_latency [get_pins]
unplaced initial DB
convert_lib_clock_tree_latencies
convert_lib_clock_tree_latencies [-views <views>]
[-latency_file_prefix <string>]
[-pins <pins>]
create_clock_tree_spec
[-override_exising_latencies[_pins
<pins>]]
place_opt_design [-sum_existing_latencies[_pins <pins>]]
Cannot merge reason: UniqueUnderParent This means two clock gates have the same ‘globally unique enable’
controlling their enable input, but are in the fanout of logically different clock gates further up the tree. Merging
these would be logically inequivalent.
Note: Similar situation with SOCV delay sigma – CTS excludes, CTE includes
37 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
18.1
Concepts – Antenna diodes
input port
block design
• Some flows have antenna diodes present pre-CTS
– For example if added at clock input ports during top level preserved
partitioning
• 18.1 Solution
– Avoids all the problems with the old scheme
– Easy to write down with pencil & paper
CTS_ccl_buf_1234
– The number is unique over all CTS inserted cell
instances
F F
F F
F F F IC F
Bad F F F IC F G Good
F IC F IC
for F G
G F
G F FF for
clock F IC F F F
IC F F IC F F IC F clock
power
G F G
F
G G power
F F F F F F F
F F
F
46 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide..
Clock Power– Buffering under leaf ICGs
create_flexible_htree …
Define H-tree(s)
[create_flexible_htree …]
OLD NEW
clock_tree_source_group myhtree
generated clock_tree
flexible_htree_myhtree_0...3
G G
B B
• Generic form
– set_ccopt_property <property name> <value> [-<object_type> <object name>]
– get_ccopt_property <property name> [-<object_type> <object name>]
• What happens when object/index is not specified?
– set_ccopt_property target_skew 0.123
– Sets skew target on all existing skew groups as well as the skew target used for any future created skew groups
– get_ccopt_property target_skew
– Returns single value only if all skew groups and the ‘unkeyed’ value used for new skew groups are same setting
– Otherwise returns an unfriendly list of skew group names and targets not accepted by set_ccopt_property
• Types of indexing
– By attributes on different object types: skew_group, clock_tree
– By net type: top/trunk/leaf
– By object type: power_domain, delay_corner, ...
• Setting skew target for current and future defined skew groups
– set_db cts_target_skew 200ps
• Feedback
• Email comments, questions, and suggestions to [email protected].
© 2018 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at www.cadence.com/go/trademarks are trademarks or registered trademarks of
Cadence Design Systems, Inc. All other trademarks are the property of their respective owners.