Physical Design Essentials
Physical Design Essentials
Physical Design Essentials
Course contents:
Readings
Unit 8
W&C&C: Chapter 13
S&Y: Chapter 7
Y.-W. Chang
Unit 8
Clock Routing
p0
p2
p6
p5
p3
Clock-tree synthesis (CTS): make the clock nets a tree
Unit 8
Y.-W. Chang
ASICON-92; Chao & Hsu & Ho, DAC-92; Edahiro, NEC R&D, 1991.
Simulation-based CTS
Y.-W. Chang
Y.-W. Chang
Unit 8
Y.-W. Chang
Unit 8
Y.-W. Chang
submicron wires.
Resistor ri must charge all downstream capacitors.
Elmore delay: Delay can be approximated as sum of sections:
resistance downstream capacitance.
Unit 8
Y.-W. Chang
Wire Models
Lumped circuit approximations for distributed RC lines: -model
(most popular), T-model, L-model.
Unit 8
Y.-W. Chang
Unit 8
Y.-W. Chang
10
where and are the per unit values of resistance and capacitance,
l the length of the interconnecting wire, r1 = xl, c1 = xl, r2 = (1 - x)l,
c2 = (1 - x)l.
Unit 8
Y.-W. Chang
11
Zero-Skew Computation
Balance delays: r1(c1/2 + C1) + t1 = r2 (c2/2 + C2) + t2.
(): per
Unit 8
merging segment
Y.-W. Chang
12
Boese & Kahng, ASICON-92; Chao & Hsu & Ho, DAC92; Edahiro, NEC R&D, 1991
Consists of two stages: bottom-up + top-down
Bottom-up: Build the potential embedding locations of
clock sinks (i.e., a segment for potential tapping points)
Top-down: Determine exact locations for the embedding
Unit 8
Y.-W. Chang
13
Unit 8
Y.-W. Chang
14
Unit 8
Y.-W. Chang
15
Clock Meshes
16
Effects of IR drop
1.46V
1.8V
violation
HM1
HM1
SM1
HM2
HM3
SM2
SM2
Unit 8
HM1
SM1
HM2
2.89V
SM2 HM3
3V
HM3
SM2
HM2
SM1
2.95V
Y.-W. Chang
17
Unit 8
Y.-W. Chang
18
Unit 8
Y.-W. Chang
19
Problem Formulation
Let G = {N, B} be a P/G network with n nodes N = {1, , n} and b
branches B = {1, , b}; branch i connects two nodes: i1 and i2 with
current flowing from i1 to i2.
Vi1 vi 2
li
.
Ii
wi
i2
i1
wi
li
Y.-W. Chang
20
Constraints
The voltage IR drop constraints.
Vi Vh, min for power networks.
Vi Vl , max for ground networks.
The minimum width constraints: wi
liIi
wi , min
Vi1 Vi 2
I 0
i
iB ( j )
Unit 8
Y.-W. Chang
21
Equivalent circuit
The equivalent
resistor Rs is just the sum of all the resistors in
n 1
series, Rs
R.
i
i 1
Unit 8
Y.-W. Chang
22
Ri
Vi 1 Vi Vs RiIei
Rs
Iei 1 Iei Ii
Equivalent circuit
Unit 8
Y.-W. Chang
23
Unit 8
Y.-W. Chang
24
no
iterative loop
RC Extraction
Simulation
SI Analysis
Traditional flow
Unit 8
IR-drop Analysis
OK
yes
P&R
P&R
RC Extraction
RC Extraction
Simulation
Simulation
SI Analysis
no
OK
iterative loop
yes
Floorplanning
SI Analysis
DAC-04 flow
(Wu & Chang)
Y.-W. Chang
Unit 8
Y.-W. Chang
26
Unit 8
Y.-W. Chang
27
Unit 8
Y.-W. Chang
28
Worst-case
interconnect
delay due
to crosstalk
60
Delay (ps)
50
40
30
Interconnect
delay
20
10
Gate delay
650
500
350
250
180
150
100
70 (nm)
Technology Node
In 0.18m wire-to-wire
capacitance dominates (CW>>CS)
Unit 8
Y.-W. Chang
CS
CW
29
Total delay
Unit 8
Y.-W. Chang
30
Wire Sizing
Wire length is determined by layout architecture, but we can
choose wire width to minimize delay.
Wire width can vary with distance from driver to adjust the
resistance which drives downstream capacitance.
Wire with minimum delay has an exponential taper.
Can approximate optimal tapering with segments of a few
widths.
Recent research claims that buffering is more effective than
wire sizing for optimizing delay, and two wire widths are
sufficient for area/delay trade-off.
Unit 8
Y.-W. Chang
31
As n , Dn D:
Optimal wire sizing function f(x) = ae-bx, where
Unit 8
x
Y.-W. Chang
32
unit wire area capacitance c0, unit wire fringing capacitance cf, unitsized wire resistance r0, unit-size capacitance of a buffer cb, unitsize buffer resistance rb, intrinsic buffer delay Tin, and the number
of buffers N.
Objective: Determine the stage ratio for buffer sizes and the
stage ratio for wire widths such that the wire delay is minimized.
Unit 8
Y.-W. Chang
33
Unit 8
Y.-W. Chang
34
Minimize Dmax
subject to Di (
w) D
max
, i 1..m
L wi U , i 1..n
w9
w7
w4
Unit 8
w11
D1<Dmax
w5
w10
b
w1
w8
w6
w3
Y.-W. Chang
w2
D2<Dmax
35
Unit 8
36
w7
w4
Unit 8
w11
D1<Dmax
w5
w10
b
w1
w8
w6
w3
Y.-W. Chang
w2
D2<Dmax
37
w1
w2
Loads
w3
1 D1
2 D2
w4
Unit 8
Y.-W. Chang
w5
38
min
st
cx
Axb
xX
Mathematical
formulation
Unit 8
Posynomial
forms
Positive coefficient
polynomials
Y.-W. Chang
min
st
L()=cx + (Ax-b)
xX
Lagrange multipliers
39
Mathematical Programming
Formulation:
Minimize f ( x)
subject to g i ( x) 0, i 1..m
m
Lagrangian: L( ) f ( x) g ( x), where 0
i i
i
i 1
m
L( )
0 f ( x ) g ( x ) 0
i i
xi
i 1
Unit 8
Y.-W. Chang
40
Lagrangian Relaxation
Minimize f ( x)
LRS
Minimize f ( x) i gi ( x)
i 1
subject to g i ( x) 0, i 1..n
subject to gi ( x) 0, i n 1..m
g i ( x) 0, i n 1..m
Unit 8
Y.-W. Chang
41
Lagrangian Relaxation
Minimize Dmax
subject to Di (
w) D
max
, i 1..m
L wi U , i 1..n
Lagrangian Relaxation
m
Minimize Dmax i ( Di (
w) D
i 1
max
subject to L wi U , i 1..n
m
L
By
0 , we have i 1
Dmax
i 1
Minimize
D (w )
i 1
subject to L wi U , i 1..n
Unit 8
Y.-W. Chang
42
Lagrangian Relaxation
Lagrangian
Relaxation
Augmented
Lagrangian
Weighted
Delay
SQP
TILOS
SLP
Algorithmic
approaches
Unit 8
Mathematical
Programming
Y.-W. Chang
43
Update Multipliers
Weighted Delay
Optimization
Converge?
No
Yes
done
Unit 8
Y.-W. Chang
44
D1
D2
1 2
1 2
Dmax
Unit 8
D1 D2
Dmax
Y.-W. Chang
D1 D2
45
Weighted Minimization
D1
D2
b
w2
Unit 8
w3
Y.-W. Chang
46
Step 1 :
new
i
old
i
k ( Di Dmax ),
where lim k 0, k
k
k 1
Unit 8
Y.-W. Chang
47
Convergence Sequence
Minimize
Max Delay
i Di ( w )
i 1
subject to L wi U , i 1..n
Any Feasible Maximum Delay =
Upper Bound
Optimal Solution
Lagrangian = Lower Bound
Weighted Delay <= Maximum Delay
# Iterations
Unit 8
Y.-W. Chang
48
d2
Aa
Ab
D1
d3
D2
Ac
Aa d1 d 2 D1
Ab d1 d 2 D1
Ab d1 d 3 D2
Ac d 3 D2
Unit 8
Exponential growth
More accurate
Can exclude false paths
Y.-W. Chang
49
Ae
d2
D1
d3
D2
Ac
Aa d1 Ae
Ab d1 Ae
Ae d 2 D1
Ae d 3 D2
Polynomial size
Less accurate
Contains false paths
Ac d 3 D2
Unit 8
Y.-W. Chang
50
Path Based
1
31
53
32
jinput ( i )
Unit 8
ji
ik
koutput ( i )
3
5
jinput ( i )
Y.-W. Chang
ji
ik
koutput ( i )
51
42
52
3,in 3,out
43 5331 32
41
51
Appendix A:
Shih and Chang
Fast timing-model independent clock-tree synthesis
DAC-10, TCAD-12
Unit 8
Y.-W. Chang
52
Introduction
timing
vdd
vss
insufficient accuracy
time-consuming
solution?
Unit 8
Y.-W. Chang
53
Symmetrical Structure
Is timing-model independent
Do not need simulation information
n4 (1,57)
n2
n1 (12,26)
(3,29)
n6 (33,29)
n3 (33,19)
n5 (5,1)
n7 (33,
9)
n4
n1
n2
n5
snaking
n6
n'3 (21,23)
n7
Y.-W. Chang
54
Problem Formulation
Question
Unit 8
Y.-W. Chang
55
Specification
Flow
Input
Branch-Number Planning
Tree Construction
Buffer Insertion
Output
Unit 8
Y.-W. Chang
56
Branch-Number Planning
Observation
prime
Planning
Y.-W. Chang
57
Branch-Number Planning
Unit 8
BNP = B(212+3)
B(212+1)
B(212+2)
B(212)
B(212+4)
= <=53,
< 3,
71,
107,
43,
2,3,2
33,
5
2>>2, 2, 2 >
branches
3
3
3
2
2
3
3
2
pseudo sinks
sinks
Y.-W. Chang
2
2
58
Tree Construction
Flow
Partitioning
Embedding-Region
Construction
Root?
Y
Node Embedding
Unit 8
Y.-W. Chang
59
TRRi
extended TRR
core
Manhattan
distance
radius
Unit 8
extended
radius
TRRj
Y.-W. Chang
60
Partitioning
Unit 8
Y.-W. Chang
61
Polar-Angles Sorting
center point
Unit 8
Y.-W. Chang
Divided Result
cluster diameter
62
Recursive Dividing
Unit 8
Y.-W. Chang
Corresponding Clusters
63
Embedding-Region Construction
Region Extension/Intersection
Resulting Regions
CCL
CCL
embedding
region
Unit 8
Y.-W. Chang
64
Node Embedding
Root Embedding
Level-1 Embedding
Level-2 Embedding
clock source
tree root
level -1
CCL
the closest
position
Unit 8
level -2
CCL
snaking
Y.-W. Chang
65
Pseudo-Sink Handling
For partitioning
Relax the sizes of clusters in a partition which can
differ by at most one for the first recursion
For embedding-region construction
Construct no embedding regions for pseudo sinks to
reserve the flexibility of snaking
For node embedding
Let the embedding regions of pseudo sinks cover
entire chip
Dangling wires can be identified and attached to proper
sub-trees successfully
Unit 8
Y.-W. Chang
66
Buffer Insertion
Unit 8
Second-Time Insertion
Y.-W. Chang
Third-Time Insertion
67
(fF)
(s)
(ps)
(fF)
(s)
Ours
skew
(ps)
usage runtime
(fF)
(s)
r1
267
14.005 14001
5.012 15229
5126
1.510 13829
0.070
r2
598
16.012 28011
11
6.421 29234
7374
1.770 31056
0.280
r3
862
16.532 39123
26
5.611 41431
12739
2.310 44188
1.050
r4
165
5.418 91015
17871
2.540 98450
3.350
r5
3101 21.557 149875 498 7.028 156854 26045 3.010 171228 5.560
avg.
7.93 0.92 46.29 2.77 0.96 24343.13 1.00 1.00 1.00
comparison
Unit 8
68
Unit 8
Y.-W. Chang
69
Appendix B:
Liu and Chang
Floorplan and power/ground network co-synthesis
for fast design convergence
ISPD-06 (TCAD-07)
B
A
Unit 8
Y.-W. Chang
70
Unit 8
Y.-W. Chang
71
Post-layout verification
AstroRail
Unit 8
Y.-W. Chang
72
Initialize B*-tree
and temperature T
T
p min 1, e
Perturb B*-tree
Perturbations (neighboring solutions) Pack B*-tree
Op1: Rotate a block
Construct
Update T
Op2: Move a node/block to
P/G network
another place
Update P/G
Evaluate cost
pitch Dpitch
Op3: Swap two nodes/blocks
Op4: Resize a soft block
N
Better ?
The cost function is based
Accept?
Y
N
on the floorplan cost and P/G
Y
Keep solution
network cost
Recover last
T is decreased every n cycles,
solution
Cool/Good
where n is proportional to the
enough?
N
number of blocks
Y
Unit 8
Y.-W. Chang
73
Cost Function
Cost function:
Wirelength
P/G Density
W A
A
2
pitch
D
W: Wirelength
A : Area
: P/G network cost (penalty of power integrity violation)
Dpitch: pitch of P/G network
Unit 8
for higher P/G density and larger one for lower P/G
Smaller
density
Y.-W. Chang
74
100
/ avg 10
avg
1
0.1
Dpitch
Dpitch
1
0
SA process
-1
0.01
1
0.1
0.01
0.001
0.0001
0.00001
Temperature
Unit 8
/ avg
75
EM cost
Bem
B
IR-drop cost
(1 )
pvi Pv
PiP
v pvi
Vlim, pi
, 0 1
Y.-W. Chang
76
Height
Floorplan
3+1 =4
Width
Y.-W. Chang
77
Power pad
Module
Global trunk
node
Power
trunk
Power strap
Unit 8
Power pin
Y.-W. Chang
78
Power pad
Module
Reduced circuit
Power
trunk
Power strap
Unit 8
Power pin
Y.-W. Chang
79
d/2 d d/2
Unit 8
Y.-W. Chang
80
Overlapping Area
d/2 d d/2
Unit 8
Y.-W. Chang
81
3mA
1mA 1mA
5mA
Overlapping
Area
Unit 8
4mA
Y.-W. Chang
82
Gx = i
Unit 8
Y.-W. Chang
83
The closer the P/G pin is placed to the power pad, the smaller
the IR-drop
1.9V
0.6V
3.4V
2.8V
2.5V
2.4V
100mA
30mA
20mA
10mA
5V
5V
10mA
Unit 8
10
-0.4V
20mA
30mA 100mA
Y.-W. Chang
84
Right-boundary condition
Top-boundary condition
Unit 8
Y.-W. Chang
85
Power-Hungry Modules
Unit 8
Y.-W. Chang
86
Results on OpenRISC1200
OpenRISC1200
*Astro
Flow
*Astro w/
IR-drop
Driven
Placement
Our
Improv.
vs.
Astro w/
IR-drop
Our
Flow
3.86
3.86
3.33
15.9%
Utilization (%)
62
62
72
13.9%
Wirelength (m)
1655463 1539125
1540172 -0.1%
8.62
8.54
8.55
-0.1%
78.20
55.14
41.8%
346
135
2.56X
Iterations
Y.-W. Chang
87
B
A
Unit 8
Y.-W. Chang
88