Physical Synthesis 2.0
Physical Synthesis 2.0
Physical Synthesis 2.0
Andrew B. Kahng
UCSD CSE and ECE Departments
[email protected]
http://vlsicad.ucsd.edu
Orthogonalize concerns
Function vs. implementation
Logic vs. timing vs. embedding
ECE 260B CSE 241A Intro and ASIC Flow 2 Andrew B. Kahng, UCSD
[UCSD
Concept: How the IC Design Flow is Evolving ECE 260B
CSE 241A]
Synthesis
Design closure through tight
integrations Gate Netlist
RTL, GDSII signoffs = business
structure of semiconductor creation FP, Place, CTS, Opt
Extraction,
One-pass flow: required for Updated Gate Netlist
Timing, Physical
Verification
Productivity, requires Predictability
By Guardbands? Routing
By Unifications?
By Statistics? GDSII
By Methodology (to avoid issues)?
Manufacturing
ECE 260B CSE 241A Intro and ASIC Flow 3 Andrew B. Kahng, UCSD
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges / Stressors
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?
230
225
220
Power (mW) 215
Shift the location of blockage 210
205
Macro size 200
260m x 65m 195
184m x 92m 190
0% 25% 50% 75% 100%
Predict by doing
Constructive prediction
(Run under the hood quick and dirty, else no leverage)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 6
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?
Out
Better netlist (usually), at one (worst) corner
Better netlist (usually) + placed DEF (not legalized)
N.B.: very fast TAT required by customers
Physical Synthesis
RC tech file Libraries, LEF,
Floorplan information (tluplus,captable) tech files
Routed Results
About 5
iterations Breakdown of Timing Violations on per Block Basis
BTI BEOL,MOLvariations
SignoffcriteriawithAVS
SOCcomplexity
Filleffects
Layoutrules
http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtbfinfetjan2013.aspx
MetalVIA1
(M1 M2)
VIA0(MOLx M1)
NWell 1Pfin
Poly
3Pfin
Fin
Active 2Pfin
M1
M2
3Pfin
1Pfin
MOL1
MOL2
4Ppoly http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtbfinfet
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 17
processsoc2015q1.aspx
FinFET: Aggressive Voltage Scaling
FinFET enables voltage scaling for reduced dynamic
power
Better electrostatic control better performance at low supply
voltage
High-performance mode: wire-dominated
Low-performance mode: gate-dominated
C.H.Lin,VLSITSA,2012,p.12.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 18
[DAC15]
Gate-Wire Balancing
Unbalanced gate-wire delay causes severe delay variation
on data and clock paths across modes
Delay variation in clock paths == skew variation
Increased difficulty for timing closure (ping-pong effect)
Minimization of skew variation is important for timing closure
(Our work at DAC15 uses global-local optimization achieves 22% skew variation reduction)
Skew = -0.1/+0.2
Clocklatency
datapath Corner Skew
Launch Capture
SS, 0.7V,25C 1.0 1.1 0.1
1.0 /0.7 1.1
/0.7 FF, 1.1V,25C 0.9 0.7 +0.2
Low voltage: gate delay dominates
launch path capture path High voltage: wire delay dominates
Skew reversal
Power/area overheads
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 19
FinFET: Less Body Effect, Richer Libraries?
FinFET 4-input NAND ~ planar bulk 3-input NAND
More complex cells / higher fan-in cells could be
made available to synthesis
w/ body effect
Numberoffaninlimitedbybodyeffect
BulkFinFETs:Fundamentals,Modeling,andApplication,JongHo
Lee,SNU
< MinOverlap
< MinSpacing
metalpitch<viapitch
Limited pin access with small track cells
Widerpowerrail
Wider power rail
for reliable connection
M2
fewer pin access points V1
M1
Complex design rules Poly
Fin
+ less pin access
Pin accessibility problem
Access
Difficulty
conflict in routingarea reduction
between point
and routability
9TNAND2
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 21
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos [ISQED02] http://vlsicad.ucsd.edu/Publications/Conferences/131/c131.pdf
[iSQED10] http://vlsicad.ucsd.edu/Publications/Conferences/267/c267.pdf
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?
WNSofpathsthroughSRAMs(ns)
2
0.8
3 Blockage
sram_pitch
4 0.9
slack1
5
1 slack2
Placement region for
1.1 slack3
standard cells slack4
1.2
slack5
1.3
Blockage Blockage 0 10 20 30
SRAMpitch(um)
0.15
143psattighter
MaxDeltaPathSlack(SI nonSI)(ns)
0.14 clockperiod
0.13
0.12
81psatsignoff
clockperiod
0.11
0.1
0.09
0.08
0.07
0.06
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
Clockperiod(ns) 24
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
[SLIP15]
Non-SI vs. SI
Top-1000 critical paths from Viterbi design (clock period = 1.0ns)
Slack diverges by 81ps !!! ~4 stages of logic at 28nm FDSOI
Unfortunately, we dont know coupling before routing !!!
PathSlackinNonSIMode(ns)
Ideal correlation
81ps
PathslackinSIMode(ns)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 25
[DAC15]
WLM, RC (Interconnect proxy) Effects
23
22.8
22.6
3DICPower(mW)
22.4
22.2
22
21.8
21.6 1.35mW
21.4 (6.43%)
21.2
21
20.8
0 0.2 0.4 0.6 0.8 1 1.2
WLMCap(pF)
Example: SOCE-based Shrunk2D (S2D) flow [1]
Perform synthesis with different WLM caps, P&R with S2D flow
Shown: total power (#buffers, #instances, instance area, WL,
similar)
[1]Panth etal.,DesignandCADMethodologiesforLowPowerGateLevelMonolithic3DICs,Proc.ISLPED,2014,pp.171176.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 26
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?
500
400
RBM
300
BL BLM B
200
100
0
0.1
0.125
0.250
0.33
0.4
0.5
1.0
2.0
2.5
3.0
4.0
8.00
10.00
Coreaspectratio
Zero skew
10/0
-1000 -893
Useful skew
Synthesis
Back annotation
Placement / Place Opt.
Wangetal.inDAC06 proposetoback
annotateusefulskewfrompost
CTS placementtobeforesynthesis
RTL netlist
Runtime (min)
Runtime (min)
150 120
100 80
50 40
aes_cipher des_perf
0 0
-6 -5 -4 -3 -7 -6 -5 -4 -3
TNS (ns) TNS (ns)
1600 200
Runtime (min)
Runtime (min)
1200 150
800 100
400 50
0
jpeg_encoder 0
mpeg2
-25 -20 -15 -10 -9 -8 -7 -6
TNS (ns) TNS (ns)
Mandrel
OD
Example solution
Minimplantwidth
DDA violation
Intertwinethe
violation historically
Minjog/notchwidth
separatetasksof
violation P&Randpost
routeoptimization
Minimplantwidth
violation
...
setupholdc2q c2qn
Free pessimism reduction in STA fixedmodel
hold
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 38
Flexible Timing Model Recover Margin
Independent datapaths in PBA: using fixed FF timing
model loses performance optimization opportunity
setup:10ps c2q:20ps
FF1
480ps
470ps 470ps
Total:500ps Total:500ps
460ps
Next
SolveSequentialLP Better exploitation of disjoint
(STA_FTmax ,STA_FTmin) cycles/modes
Solution More accurate modeling of
Annotatenewtimingmodel setup-hold-c2q tradeoff
foreachflipflop Circuit optimization should
natively exploit FF timing model
flexibility
Timingsignoffwithannotatedtiming
Overlypessimisticaginglibrary
largeareapenalty
OurmethodfindsKneepointfor
balancedareaandpowertradeoff
Experimentsetup:
DC/ACBTI@125C
32nmPTMtechnology
4benchmarkcircuitimplementations
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 43
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?
Technology: 28nm LP
In red are 12T cells = larger area, smaller delay
In blue are 8T cells = smaller area, larger delay
X directional shift
four sites
8T Cell 12T Cell
Y directional shift
one M2 pitch
64nm 48nm
64nm
Assume: M2 pitch = 64nm
Technology: 28nm LP
Design: AES
8T cells are in blue
12T cells are in red
LS
LS