SoCDesign PDF
SoCDesign PDF
SoCDesign PDF
ICE of silicon
Computational efficiency [Roza]
106 [MOPS/W]
105 3DTV
Intrinsic computational efficiency
104 Query
by
humming
103
7400
Turbosparc
102 601
604 604e
604e
21364
Ultra 21164a
i386SX microsparc
sparc P6
101 i486DX P5 Super
68040
sparc
100
2 1 0.5 0.25 0.13 0.07
Feature size [µm]
http://bwrc.eecs.berkeley.edu/cic
Designing Embedded Systems on Silicon-1
J. van Meerbergen 2/7/13
Hardware Efficiency
efficiency
high
ASIC
ASIP
medium
DSP
low GP proc
FPGA
A Finite Impulse
Response (FIR) filter
CPU
MEM
ASIC
• No picture
Accel
• Synthesis
• DFT Insertion
• Floorplanning
• Power Planning
• Clock tree insertion
• Place and Route
• RC extraction
• Timing check
8
Design Tools
Place &Route
RC Extraction
Static Timing
DRC/LVS Analysis
11
Flow with Multi-Vendor Tools
12
Design Abstraction Levels
SYSTEM
MODULE
+
GATE
CIRCUIT
DEVICE
G
S D
n+ n+
13
impact of a
design decision
Conceptual level
high level
RT level
gate level
transistor level
complexity
16
RTL Coding
• RTL stands for Register Transfer Level
• RTL description of a design describes the
design in terms registers and logic that
resides between them
Sample RTL code
• This captures the timing constraints of the
design efficiently
if IR(3) = 0'then'
• Verilog and VHDL are two most popular
hardware description languages that are PC := PC + 1;
commonly used to write RTL description else
• RTL description captures the change in DBUF := MEM(PC);
data at each clock cycle
MEM(SP) := PC + 1;
• All the registers are updated at the same
time in a clock cycle SP := SP - 1;
User
ASIC cell
constraints
RTL library
18
Logic Synthesis: Technology Mapping
Z = (not S and A) or (S and B)
A Generic Gates
S
Z
Standard Cells
A
I-002
S
Z
B ANDOR-001
19
DfT Insertion
test validation
Handoff deliverables
20
Backend Design
• Technology Information and Chip Physical Architecture
Physical Libraries I/O Power Grid Chip Hierarchical Floorplan
– Corelib.lef & Hierarchical
Planning
Design
Analysis
Assembly STA Implementation
– IOlib.lef
– Rams.vclef
• Timing libraries Physical Synthesis
– Corelib_slow,lib
– Corelib_fast.lib Placement DFT Clock Tree Post Placement
Synthesis Optimisation
– Corelib_typ.lib
– IOlib_slow.lib
– RAM timing libraries Routing and Final Optimisation
• Timing constraints (user
defined)
Signal Routing Crosstalk Fixing Post Route Fix
• Design Netlist Antennas Editing
Decap, Fillers
– Add IO pads, power pads
– Verilog design netlist
• IO pad location file
21
Floorplanning
• Floor planning is the task of deciding
how the chip area is to be utilized by
the leaf modules taking care of wiring
considerations
• Two methods of floorplanning:
– Top Down: Here the chip is
partitioned up during the
development of the RTL level
modelling. Area is assigned on the
basis of estimated block areas and Std. Cells
shapes, and blocks are placed
relative to each other depending on
connectivity.
– Bottom up: Here the design is first
synthesised and then the resultant
gates are clustered together into
blocks on the basis of connectivity. IP Block
• Most designs use a combination of
both of the above techniques, but the
emphasis is increasingly on the first.
Pads 22
Floorplanning
• Calculating core size, width and height
• When calculating core size of standard cells, the core utilization must be
decided first. Usually the core utilization is higher than 85%
• The core size is calculated as follows
Example
• Standard cell area = 2,000,000um2
• Core utilization demanded = 85%
• No macros
• Core Size of Standard Cells = 2,000,000 / 0.85 =
2,352,941um2
• Width = Height = (2,352,941)0.5 =1534um 23
Floorplanning
• Core Margins
– Space for power and ground
routing
• Core limited / Pad limited designs
– When pad width > (core width +
core margin),die size is decided
by pads. And it is called pad
limited design
– When pad width < (core width +
core margin), die size is decided
by core. And it is called core
limited design
24
Power Planning
• Metal migration (also known as electro-
migration)
• Under high currents, electron collisions with
metal grains cause the metal to move. The
metal wire may be open circuit or short circuit.
– Prevention: sizing power supply lines to
ensure that the chip does not fail
– Experience: make current density of power
ring < 1mA/m
• IR drop
– IR drop is the problem of voltage drop of the
power and ground due to high current flowing
through the power-ground resistive network
– When there are excessive voltage drops in the
power network or voltage rises in the ground
network, the device will run at slower speed
– IR drop can cause the chip to fail due to
• Performance (circuit running slower than
specification)
• Functionality problem (setup or hold violations)
• Unreliable operation (less noise margin)
• Power consumption (leakage power)
• Latch up
• Prevention: adding stripes to avoid IR drop on
cell’s power line
25
Power Planning: IR Drop
Counter • Number of counts inversely proportional
to DSP clock frequency
• FC = 10, 20 and 25 MHz
enable • Ringo frequency ≈ 115 MHz @ VDD = 1.8V
• DSP induced PSN is clearly detected
Average PSN = 6 counts × 2.4 mV/count = 14.4 mV
v(t)
C2 Counts vs. DSP activity (Fc = 20 MHz)
(Tambient = 27ºC)
699
698
1 697
TC =
FC C2 counts 696 Δ counts = 6
695
694
t 693
692
691
0 50 100 150 200 250
Tester ck-cycles
Source: J. Rius, UPC 26
Voltage Drop Verification
VoltageStorm (Cadence)
Block-level Analysis
Voltage Storm
Virtual Prototype
IP Block
Partition 1
Top-level Analysis
(flat implementation) Power Grid
Encounter Power Analysis View Library
Voltage Storm
Top-level
Block-level
CreateChip
PG PG
Analysis
Sign-
Hierarchy
Results displayed
off in
SoC Encounter Interface 27
Power Grid Design
Power Power
Grid Grid Power Plan Routing Propagation
Creation Connect Ground Refinement
Analysis
28
Power Ring Width
Experience
• Gate count = 70 k
• 4000 Flip-Flops
• 80% FF with dynamic gated clock
• Current needed = 0.2mA/MHz
– Note: the value should multiply with 1.8~2 for no
gated design
Example:
• Gate count = 200 k
• No gated clock
• Clock frequency = 20 MHz
• Current needed = (200/70) * 0.2 * 20 * 2 = 22.86 mA
• Current density < 1mA/m
• The Width of P/G Ring > 22.86 um
• In order to avoid the slot rule of wide metal, the
largest width is 20 um (process dependent)
• Use two sets of P/G ring for this case
29
Power Stripe Calculation
Experience
• Add one strap set per 100 um
Example
• Core width = height = 1600
• Stripe set added = 15
Std cells
Low utilization
core
31
Placement
32
Source: Magma
Clock Tree Synthesis
• Clock signal is used as a timing reference • The goal of clock tree synthesis
in a synchronous digital system for the includes
movement of data within that system. – Creating clock tree spec file
• The Clock Tree or clock distribution – Building a buffer distribution network
network distributes the clock signal(s) from • In automatic CTS mode, Encounter will
a common point to all the elements that do the following things
need it – Build the clock buffer tree according to
• Properties of clock signals the clock tree specification file
– Balance the clock phase delay with
– They are loaded with the greatest fanout, appropriately sized, inserted clock
buffers
– travel over the greatest distances
– operate at the highest speeds
33
Clock Tree Synthesis
34
Routing
• Routing is the process of building the
physical connections between blocks
as defined by the logical connections.
• Routing takes place in more than one
layer, the exact number available
depending on the process and design
conventions.
• Layers are connected together using
vias
• Global Routing
– Assigns wires to channels
defined during the floor
planning phase
• Detailed Routing
– Assigns nets to individual
tracks in the channel
shield wire
pico pad
T1IN driver receiver bfx4 T1OUT
Propagation Delay 20mm wire
aggressor
wire length
Verification Signoff
Power
Distribution
Analysis
Parasitic
Extraction
37
Static Timing Analysis
Path 1
• This involves three main steps:
Path 2
– Design is broken down into sets of
timing paths
A D Q Z
– The delay of each path is
CLK calculated
Path 3 – All path delays are checked to see
if timing constraints have been met
• DRC
– Design Rule
Checking
• LVS
– Layout vs.
Schematic
verifications
39
Chip Finishing tiles
• Selection of appropriate
package
• Route pads to pins
– Wire length is important
– Rule checking
• GDS2 minimum required
information is the nitride or
pad opening layer or the
pad boundary layer
41
Packaging