Voltage Scaling
Voltage Scaling
Voltage Scaling
Reducing the power supply voltage is the effective technique to reduce dynamic power
with the speed penalty. Keeping all others factors constant if power scaling is scaled
down propagation delay will increase. This can be compensated by scaling down the
threshold voltage to the same extent as the supply voltage. This allows the circuit to
produce the same speed performance at a lower Vdd. At the same time smaller threshold
voltages lead to smaller noise margin and increased leakage current.
The above mentioned relation between energy and voltage is not always true. The authors
in [1] showed that quadratic relationship between energy and Vdd deviates as Vdd is
scaled down into the sub threshold voltage level. Sub threshold leakage current increases
exponentially with the supply voltage. Since in sub threshold operation the on current
takes the form of sub threshold current delay increases exponentially with voltage
scaling. At very low voltages dynamic power reduces quadratically. But the leakage
energy increases with supply voltage reduction since leakage energy is linear with the
circuit delay. Hence dynamic and leakage power becomes comparable in sub threshold
voltage region.
According to Bo Zhai et al. [1] dynamic voltage and frequency scaling is very popular
low power technique. But larger voltage ranges does not improve power efficiency. They
showed that for sub threshold supply voltages, leakage energy becomes dominant,
making "just in time completion" energy inefficient. They also showed that extending
voltage range below half Vdd will improve the energy efficiency for most processor
designs while extending this range to sub threshold operations is beneficial only for
specific applications. One of the important points to be noted from their study is DVFS in
sub threshold voltage range is never energy efficient.
References
[1] Bo Zhai, David Blaauw, Dennis Sylvester and Krisztian Flaunter, "Theoretical and
Practical Limits of Dynamic Voltage Scaling", DAC , San Diago, California, USA,
pp.868-873, June 7-11, 2004
It is always interesting to talk about setup and hold!! Don’t think that if anybody asks
questions related to setup time and hold time, he or she doesn’t know about setup and
hold. He or she may know everything about setup time and hold time, time being it
confuses. The term “setup” and “hold” is such a word in this VLSI – ASIC design world
which only creates continuous questions, hard to explain in words, at least i myself is
concerned! I remember, during my MTech days my professor used to say always "whole
VLSI world is depending on two pillars, setup time and hold time". It would be more
realistic if i say that he used to scold us !!
Read more »
You might also like:
• What are the different types of delays in ASIC or VLSI design?
• What is the difference between a latch and a flip-flop?
• Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis
• Timing paths
LinkWithin
4 comments Tags: hold time, setup time, Static Timing Analysis (STA)
Reactions:
06 June 2009
Timing paths
Timing Path
Timing path is defined as the path between start point and end point where start point and
end point is defined as follows:
Start Point:
All input ports or clock pins of a sequential element are considered as valid start point.
End Point:
All output port or D pin of sequential element is considered as End point.
Read more »
You might also like:
• Dynamic vs Static Timing Analysis
• Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis
• Multi Voltage Designs: Timing Issues
• PVT, Derarting and STA
LinkWithin
0 comments Tags: Static Timing Analysis (STA), Timing Analysis, Timing paths
Reactions:
16 December 2008
Transition Delay
Transition delay or slew is defined as the time taken by signal to rise from 10 %( 20%) to
the 90 %( 80%) of its maximum value. This is known as “rise time”.
Similarly “fall time” can be defined as the time taken by a signal to fall from 90 %( 80%)
to the 10 %( 20%) of its maximum value.
Transition related constraints can be provided in Design Compiler (logic synthesis tool
from Synopsys) by using below commands:
For example, to set a maximum transition time of 3.2 on all nets in the design adder, enter
the following command:
This command applies maximum capacitance limit to output pin or port of the design.
This command can also be used to apply capacitance limit on any net.
Eg:
Propagation Delay
Propagation delay is the time required for a signal to propagate through a gate or net.
Hence if it is cell, you can call it as “Gate or Cell Delay” or if it is net you can call it as
“Net Delay”
Propagation delay of a gate or cell is the time it takes for a signal at the input pin to affect
the output signal at output pin.
For any gate propagation delay is measured between 50% of input transition to the
corresponding 50% of output transition.
For net propagation delay is the delay between the time a signal is first applied to the net
and the time it reaches other devices connected to that net.
Propagation delay is taken as the average of rise time and fall time i.e. Tpd=
(Tphl+Tplh)/2.
Propagation delay depends on the input transition time (slew rate) and the output load.
Hence two dimensional look up tables are used to calculate these delays. How to
calculate propagation delay of net and gate? Please refer below articles to find the
detailed explanation.
Contamination Delay:
Best case delay from valid input to valid output. i.e. minimum propagation delay.
14 October 2008
Net delay is the difference between the time a signal is first applied to the net and the
time it reaches other devices connected to that net.
It is due to the finite resistance and capacitance of the net. It is also known as wire delay.
This is output pin of the cell to the input pin of the next cell.
• Net Length
• Net cross-sectional area
• Resistively of material used for metal layers (Aluminum vs. copper)
• Number of vias traversed by the net
• Proximity to other nets (crosstalk)
Post-layout design is annotated with RCs extracted from layout for better accuracy.
Annotated RCs override information from WLM.
Interconnect introduces capacitive, resistive and inductive parasites. All three have
multiple effects on the circuit behavior.
Dominant parameters determine the circuit behavior at a given circuit node. Non-
dominant parameters can be neglected for interconnect analysis.
• Inductive effect can be ignored if the resistance of the wire is substantial enough-
this is the case for long aluminum wires with a small cross section or if the rise
and fall times of the applied signals are slow.
• When the wires are short, the cross section of the wire is large or the interconnect
material used has a low resistivity, a capacitive only model can be used.
• When the separation between neighboring wires is large or when the wires only
run together for short distance, inter-wire capacitance can be ignored, and all the
parasitic capacitance can be modeled as capacitance to ground.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~
Capacitance
Capacitance can be modeled by the parallel plate capacitor model.
C = (ε / t).WL
Where
Generally higher metal layers (i.e. interconnects) have higher thickness (i.e. height)
and higher dielectric layers have higher permittivity. Hence these wires display the
highest inter-wire capacitance. Hence use it for global signals that are not sensitive
to interference. (eg. Supply rails). Or it is advisable to separate wires by an amount
that is larger than minimum spacing.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Resistance
W --> width
At very high frequencies “skin effect” comes into play such that the resistance becomes
frequency dependent. High frequency currents tend to flow primarily on the surface of a
conductor, with the current density falling off exponentially with depth into the
conductor.
Skin effect is only an issue for wider wires. Since clocks tends to carry the highest
frequency signals on a chip and also fairly wide to limit resistance, the skin effect likely
to have its first impact on these lines.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Inductance
With the adoption of low resistance interconnect materials and the increase of switching
frequencies to GHz range, inductance starts to an important role. Consequences of on
chip inductance include ringing and overshoot effect, reflection of signals due to
impedance mismatch, inductive coupling between lines, and switching noise due to
(Ldi/dt) voltage drops.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~
Lumped RC Model
If wire length is more than a few millimeters, the lumped capacitance model is
inadequate and a resistive capacitive model has to be adopted.
In lumped RC model the total resistance of each wire segment is lumped into one single
R, combines the global capacitive into single capacitor C.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
“Path resistance” is the resistance from source node to any other node.
“Shared path resistance” is the resistance shared among the paths from the source node
to any other two nodes.
Hence,
In general:
τdi=R1C1+(R1+R2)C2+……..+(R1+R2+R3+…..+Ri)Ci
If
R1=R2=R3=….=R
C1=C2=C3=…..C then
τdi=RC+2RC+……..+nRC
Thus Elmore delay is equivalent to the first order time constant of the network.
Assuming an interconnect wire of length L is partitioned into N identical segments. Each
segment has length L/N.
Then,
τd=L/N.R.L/N.C+ 2 (L/n.r+L/N.C)+……
=(L/N)2(RC+2RC+…….+NRC)
=(L/N)2. N(N+1)
τ
or d=RC.L2/2
Advantages
• It is simple
• It is always situated between minimum and maximum bounds
Disadvantages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Distributed RC model
Lumped RC model is always pessimistic and distributed RC model provides better
accuracy over lumped RC model.
But distributed RC model is complex and no closed form solution exists. Hence
distributed RC line model is not suitable for Computer Aided Design Tools.
Lossless transmission line model: This is good for Printed Circuit Board level design.
Lossy transmission line model: This model is used for IC interconnect model.
Transmission line effects should be considered when the rise or fall time of the input
signal is smaller than the time of flight of the transmission line or resistance of the wire is
less than characteristics impedance.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Wire Load Models
Extraction data from already routed designs are used to build a lookup table known as the
wire load model (WLM). WLM is based on the statistical estimates of R and C based on
“Net Fan-out”.
For fanouts greater than those specified in a wire load table, a “slope factor” is specified
for linear extrapolation.
wire_load (“5KGATES”) {
Eg:
Fanout = 7
Net length = 135.98 + 2 x 29.4005 (slope) = 194.78 ----------> length of net with
fanout of 7
Resistance = 194.78 x 0.000271 = 0.05279 units
Capacitance = 194.78 x 0.00017 = 0.03311 units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Wire load models for synthesis
Wire load modeling allows us to estimate the effect of wire length and fanout on the
resistance, capacitance, and area of nets. Synthesizer uses these physical values to
calculate wire delays and circuit speeds. Semiconductor vendors develop wire load
models, based on statistical information specific to the vendors’ process. The models
include coefficients for area, capacitance, and resistance per unit length, and a fanout-to-
length table for estimating net lengths (the number of fanouts determines a nominal
length).
Selection of wire load models in the initial stage (before physical design) depends on the
fallowing factors:
1. User specification
Once the final routing step is over in the physical design stage, wire load models are
generated based on the actual routing in the design and synthesis is redone using those
wire load models.
In hierarchical designs, we have to determine which wire load model to use for nets that
cross hierarchical boundaries. There are three modes for determining which wire load
model to use for nets that cross hierarchical boundaries:
Top:
Applying same wire load models to all nets as if the design has no hierarchy and uses the
wire load model specified for the top level of the design hierarchy for all nets in a design
and its sub designs.
Enclosed:
The wire load model of the smallest design that fully encloses the net is applied. If the
design enclosing the net has no wire load model, then traverses the design hierarchy
upward until we finds a wire load model. Enclosed mode is more accurate than top mode
when cells in the same design are placed in a contiguous region during layout.
Use enclosed mode if the design has similar logical and physical hierarchies.
Segmented:
Wire load model for each segment of a net is determined by the design encompassing the
segment. Nets crossing hierarchical boundaries are divided into segments. For each net
segment, the wire load model of the design containing the segment is used. If the design
contains a segment that has no wire load model, then traverse the design hierarchy
upward until it finds a wire load model.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
Interconnect Delay vs. Deep Sub Micron Issues
Performances of deep sub micron ICs are limited by increasing interconnect loading
affect. Long global clock networks account for the larger part of the power consumption
in chips. Traditional CAD design methodologies are largely affected by the interconnect
scaling. Capacitance and resistance of interconnects have increased due to the smaller
wire cross sections, smaller wire pitch and longer length. This has resulted in increased
RC delay. As technology is advancing scaling of interconnect is also increasing. In such
scenario increased RC delay is becoming major bottleneck in improving performance of
advanced ICs.
Here the gate delay and the interconnect delay are shown as functions of various
technology nodes ranging from 180nm to 60nm. The interconnect delays shown assumes
a line where repeaters are connected optimally and includes the delay due to the
repeaters. From the graph it can be observed that with the shrinking of technology gate
delay reduces but interconnect delay increases.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
01 September 2008
Wire delays or extrinsic delays are calculated using output drive strength, input
capacitance and wire load models. Other delays are intrinsic properties of each and every
gate.
• Input capacitance of the logic gate is a function of output state, output loads and
input slew rate.
• Internal timing arcs and output slew rate is a function of switching input(s).
or
where Cload=Cnet+Cpin
Cnet-->Net capacitance
Cell or gate delay is calculated using Non-Linear Delay Models (NLDM). NLDM is
highly accurate as it is derived from SPICE characterizations. The delay is a function of
the input transition time (i.e. slew) of the cell, the wire capacitance and the pin
capacitance of the driven cells. A slow input transition time will slow the rate at which
the cell’s transistors can change state logic 1 to logic 0 (or logic 0 to logic 1), as well as a
large output load Cload (Cnet + Cpin), thereby increasing the delay of the logic gate.
There is another NLDM table in the library to calculate output transition. Output
transition of a cell becomes the input transition of the next cell down the chain.
• Table models are usually two-dimensional to allow lookups based on the input
slew and the output load (Cload). A sample table is given below.
timing() {
related_pin : "CKN";
timing_type : falling_edge;
timing_sense : non_unate;
cell_rise(delay_template_7x7) {
values ( \
rise_transition(delay_template_7x7) {
Situation 1:
Input transition and output load values match with table index
values
If both input transition and output load values match with table index values then
corresponding delay value is directly picked up from the delay “values” table as
highlighted by yellow shaded data.
Situation 2:
Output load values doesn't match with table index values
• When the actual load capacitance values does not fall directly on or at one of the
load-axis index points, the delay is determined by interpolation from the closest
points. Note that to carry out interpolation input transition point should match
with the any one of the table index values.
• Determine the equation for the line segment connecting the two nearest points in
the table.
Slope m = (y2-y1)/(x2-x1) where (y2-y1) is delay segment (generally in ns) on y axis and
(x2-x1) is load segment (generally in pf) on x-axis.
y = mx+c
where
y-->delay (ns)
m-->slope
Load point of interest means load capacitance value for which delay has to be calculated.
Situation 3:
Both input transition and output load values doesn't match
with table index values
• If both input transition and load capacitance values do not match exactly with the
look up table index values then bilinear interpolation is used.
• Multiple linear interpolations (~3) are performed on multiple closest table data
points (~4) as shown in highlighted violet color in the look up table.
Situation 4:
Output load values doesn't match with table index values and is outside the table
boundary
• When the load point is outside of the boundary of the index, the delay is
extrapolated to the closest known points.
• Lookup value too far out of range of the given table value could lead to
inaccuracy. [Cadence]
Intrinsic delay
• Intrinsic delay is the delay internal to the gate. This is from input pin of the cell to
output pin of the cell.
• It is defined as the delay between an input and output pair of a cell, when a near
zero slew is applied to the input pin and the output does not see any load
condition. It is caused by the internal capacitance associated with its transistor.
• This delay is largely dependent on the size of the transistors forming the gate
because increasing size of transistors increase internal capacitors.
References
[Nekoogar] Farzad Nekoogar, “Timing Verification of Application Specific Integrated
Circuits”, Prentice Hall
LinkWithin
0 comments Tags: Delays, Gate Delay, Intrinsic Delay, Static Timing Analysis (STA),
Timing Analysis
Reactions:
12 August 2008
Timing analysis is integral part of ASIC/VLSI design flow. Anything else can be
compromised but not timing! Timing analysis can be static or dynamic. Dynamic timing
analysis verifies functionality of the design by applying input vectors and checking for
correct output vectors whereas Static Timing Analysis checks static delay requirements
of the circuit without any input or output vectors.
SPICE Simulation
Device level timing analysis is carried out using SPICE simulation. SPICE
simulation is very essential for full custom designs to verify the electrical
properties of the designs. These are calculated based on the mathematical
equations that represent electrical properties of devices. Material and some of
the electrical properties of the devices, which are represented by either variables
or constants, are stored in model files. Examples are threshold voltage of
MOSFET, electron density etc. SPICE characterized data is tabulated in
technology libraries which becomes basic delay information for the Static Timing
Analysis. For example let us consider a AND gate. Several electrical properties
such as input and output transition, propagation delay, output capacitance etc
are evaluated by this SPICE simulation. SPICE simulated data gives maximum
accuracy compared to any other form of simulation. SPICE code is manually
written and simulated. Hence for a larger design SPICE simulation is
cumbersome job. There are specific tools available for transistor level Static
Timing Analysis (STA), (Eg. Pathmill from Synopsys) SPICE simulation being the
backbone of all these tools.
In Static Timing Analysis (STA) static delays such as gate delay and net delays
are considered in each path and these delays are compared against their
required maximum and minimum values. Circuit to be analyzed is broken into
different timing paths constituting of gates, flip flops and their interconnections.
Each timing path has to process the data within a clock period which is
determined by the maximum frequency of operation. Cell delays are available in
the corresponding technology libraries. Cell delay values are tabulated based on
input transition and fanout load which are characterized by SPICE simulation.
Net delays are calculated based on the Wire Load Models(WLM) or extracted
resistance R and capacitance C. Wire Load Models(WLM) are available in the
Technology File. These values are Table Look Up(TLU) values calculated based
on the net fanout length.
The static timing analyzer will report the following delays (or it can do following
analysis):
Case Analysis
Derived Clocks
Netlist Editing
Report_clock_timing
Path-Based Analysis
The wide spread use of STA can be attributed to several factors [David]:
• The basic STA algorithm is linear in runtime with circuit size, allowing
analysis of designs in excess of 10 million instances.
• The basic STA analysis is conservative in the sense that it will over-
estimate the delay of long paths in the circuit and under-estimate the delay
of short paths in the circuit. This makes the analysis ”safe”, guaranteeing
that the design will function at least as fast as predicted and will not suffer
from hold-time violations.
• The STA algorithms have become fairly mature, addressing critical timing
issues such as interconnect analysis, accurate delay modeling, false or
multi-cycle paths, etc.
Advantages of STA:
• All timing paths are considered for the timing analysis. This is not the case
in simulation.
• Analysis times are relatively short when compared with event and circuit
simulation.
• Timing can be analyzed for worst case, best case simultaneously. This
type of analysis is not possible in dynamic timing analysis.
• Static Timing Analysis (STA) works with timing models. STA has more
pessimism and thus gives maximum delay of the design. DTA performs
full timing simulation. The problem associated with DTA is the
computational complexity involved in finding the input patterns (vectors)
that produce maximum delay at the output and hence it is slow.
Disadvantages of STA:
• All paths in the design may not run always in worst case delay. Hence the analysis
is pessimistic.
• Clock related all information has to be fed to the design in the form of constraints.
• Inconsistency or incorrectness or under constraining of these constraints may lead
to disastrous timing analysis.
References
[David] David Blaauw, Kaviraj Chopra, Ashish Srivastava and Lou Scheffer,
“Statistical Timing Analysis: From basic principles to state-of-the-art.”,
Transactions on Computer-Aided Design of Integrated Circuits and Systems
(T-CAD), IEEE.
Timing paths
LinkWithin
3 comments Tags: Static Timing Analysis (STA), Timing Analysis
Reactions:
07 July 2008
Below interview questions are contributed by ASIC_diehard (Thanks a lot !). Below
questions are asked for senior position in Physical Design domain. The questions are also
related to Static Timing Analysis and Synthesis. Answers to some questions are given as
link. Remaining questions will be answered in coming blogs.
Common introductory questions every interviewer asks are:
Intel
• Why power stripes routed in the top metal layers?
The resistivity of top metal layers are less and hence less IR drop is seen in power
distribution network. If power stripes are routed in lower metal layers this will use good
amount of lower routing resources and therefore it can create routing congestion.
Answer:
This approach allows routability of the design and better usage of routing resources.
Answer:
Improve the input transition to the cell under consideration by up sizing the driver.
Reduce the load seen by the cell under consideration, either by placement refinement or
buffering.
If allowed increase the drive strength or replace with LVT (low threshold voltage) cell.
• How do you compute net delay (interconnect delay) / decode RC values present in
tech file?
• What are various ways of timing optimization in synthesis tools?
Answer:
Logic optimization: buffer sizing, cell sizing, level adjustment, dummy buffering etc.
Optimize drive strength of the cell , so it is capable of driving more load and hence
reducing the cell delay.
Better selection of design ware component (select timing optimized design ware
components).
Use LVT (Low threshold voltage) and SVT (standard threshold voltage) cells if allowed.
• What would you do in order to not use certain cells from the library?
Answer:
Answer:
For a given wireload model the delay are estimated based on the number of fanout of the
cell driving the net.
Values of unit resistance R and unit capacitance C are given in technology file.
Once the net length is known delay can be calculated; Sometimes it is again tabulated.
• What are various techniques to resolve congestion/noise?
Answer:
Routing and placement congestion all depend upon the connectivity in the netlist , a
better floor plan can reduce the congestion.
• Let’s say there enough routing resources available, timing is fine, can you
increase clock buffers in clock network? If so will there be any impact on
other parameters?
Answer:
No. You should not increase clock buffers in the clock network. Increase in clock buffers
cause more area , more power. When everything is fine why you want to touch clock
tree??
Answer:
Better skew targets and insertion delay values provided while building the clocks.
Choose appropriate tree structure – either based on clock buffers or clock inverters or mix
of clock buffers or clock inverters.
For multi clock domain, group the clocks while building the clock tree so that skew is
balanced across the clocks. (Inter clock skew analysis).
• How you go about fixing timing violations for latch- latch paths?
• As an engineer, let’s say your manager comes to you and asks for next project die
size estimation/projection, giving data on RTL size, performance requirements.
How do you go about the figuring out and come up with die size considering
physical aspects?
• How will you design inserting voltage island scheme between macro pins
crossing core and are at different power wells? What is the optimal resource
solution?
• What are various formal verification issues you faced and how did you resolve?
• How do you calculate maximum frequency given setup, hold, clock and clock
skew?
• What are effects of metastability?
Answer: Metastability
• Consider a timing path crossing from fast clock domain to slow clock domain.
How do you design synchronizer circuit without knowing the source clock
frequency?
• How to solve cross clock timing path?
• How to determine the depth of FIFO/ size of the FIFO?
STmicroelectronics
• What are the challenges you faced in place and route, FV (Formal Verification),
ECO (Engineering Change Order) areas?
• How long the design cycle for your designs?
• What part are your areas of interest in physical design?
• Explain ECO (Engineering Change Order) methodology.
• Explain CTS (Clock Tree Synthesis) flow.
• If there are too many pins of the logic cells in one place within core, what kind of
issues would you face and how will you resolve?
• Define hash/ @array in perl.
• Using TCL (Tool Command Language, Tickle) how do you set variables?
• What is ICC (IC Compiler) command for setting derate factor/ command to
perform physical synthesis?
• What are nanoroute options for search and repair?
• What were your design skew/insertion delay targets?
• How is IR drop analysis done? What are various statistics available in reports?
• Explain pin density/ cell density issues, hotspots?
• How will you relate routing grid with manufacturing grid and judge if the routing
grid is set correctly?
• What is the command for setting multi cycle path?
• If hold violation exists in design, is it OK to sign off design? If not, why?
Qualcomm
• In building the timing constraints, do you need to constrain all IO (Input-Output)
ports?
• Can a single port have multi-clocked? How do you set delays for such ports?
• How is scan DEF (Design Exchange Format) generated?
• What is purpose of lockup latch in scan chain?
• Explain short circuit current.
Answer:
Answer:
Answer:
• What constraints you add in CTS (Clock Tree Synthesis) for clock gates?
Answer:
• What is trade off between dynamic power (current) and leakage power
(current)?
Answer:
Dynamic Power
• Explain top level pin placement flow? What are parameters to decide?
• Given block level netlists, timing constraints, libraries, macro LEFs (Layout
Exchange Format/Library Exchange Format), how will you start floor planning?
• With net length of 1000um how will you compute RC values, using
equations/tech file info?
• What do noise reports represent?
• What does glitch reports contain?
• What are CTS (Clock Tree Synthesis) steps in IC compiler?
• What do clock constraints file contain?
• How to analyze clock tree reports?
• What do IR drop Voltagestorm reports represent?
• Where /when do you use DCAP (Decoupling Capacitor) cells?
• What are various power reduction techniques?
Hughes Networks
• What is setup/hold? What are setup and hold time impacts on timing? How will
you fix setup and hold violations?
• Explain function of Muxed FF (Multiplexed Flip Flop) /scan FF (Scal Flip Flop).
• What are tested in DFT (Design for Testability)?
• In equivalence checking, how do you handle scanen signal?
• In terms of CMOS (Complimentary Metal Oxide Semiconductor), explain
physical parameters that affect the propagation delay?
• What are power dissipation components? How do you reduce them?
Answer:
Dynamic Power
Hynix Semiconductor
• How do you optimize power at various stages in the physical design flow?
• What timing optimization strategies you employ in pre-layout /post-layout stages?
• What are process technology challenges in physical design?
• Design divide by 2, divide by 3, and divide by 1.5 counters. Draw timing
diagrams.
• What are multi-cycle paths, false paths? How to resolve multi-cycle and false
paths?
• Given a flop to flop path with combo delay in between and output of the second
flop fed back to combo logic. Which path is fastest path to have hold violation
and how will you resolve?
• What are RTL (Register Transfer Level) coding styles to adapt to yield optimal
backend design?
• Draw timing diagrams to represent the propagation delay, set up, hold, recovery,
removal, minimum pulse width.
Clock Tree Synthesis (CTS)
The goal of CTS is to minimize skew and insertion delay. Clock is not propagated before
CTS as shown in Figure (1).
After CTS hold slack should improve. Clock tree begins at .sdc defined clock source and
ends at stop pins of flop. There are two types of stop pins known as ignore pins and sync
pins. ‘Don’t touch’ circuits and pins in front end (logic synthesis) are treated as ‘ignore’
circuits or pins at back end (physical synthesis). ‘Ignore’ pins are ignored for timing
analysis. If clock is divided then separate skew analysis is necessary.
Global skew achieves zero skew between two synchronous pins without considering
logic relationship.
Local skew achieves zero skew between two synchronous pins while considering logic
relationship.
If clock is skewed intentionally to improve setup slack then it is known as useful skew.
Rigidity is the term coined in Astro to indicate the relaxation of constraints. Higher the
rigidity tighter is the constraints.
In Clock Tree Optimization (CTO) clock can be shielded so that noise is not coupled to
other signals. But shielding increases area by 12 to 15%. Since the clock signal is global
in nature the same metal layer used for power routing is used for clock also. CTO is
achieved by buffer sizing, gate sizing, buffer relocation, level adjustment and HFN
synthesis. We try to improve setup slack in pre-placement, in placement and post
placement optimization before CTS stages while neglecting hold slack. In post placement
optimization after CTS hold slack is improved. As a result of CTS lot of buffers are
added. Generally for 100k gates around 650 buffers are added.
**********************************************************************
*
* Clock Tree Skew Reports
*
* Tool : Astro
* Version : V-2004.06 for IA.32 -- Jul 12, 2004
* Design : sam_cts
* Date : Sat May 19 16:09:20 2007
*
**********************************************************************
Clock: clock
Pin: clock
Net: clock
Related Articles
• Physical Design Flow
• Libraries
• Inputs–outputs from physical design process
• Floor Planning
• Power Planning
• Timing Analysis in Physical Design
• Placement
• Routing
26 September 2007
Clock
Clock Tree Synthesis (CTS) tools should be aware of different power
domains and understand the level shifters to insert them in
appropriate places. Clock tree is routed through level shifters to
reach different power domains. Simultaneous timing analysis and
optimization is necessary for multiple voltage domains. Thus CTS
becomes more complex in multi voltage designs.
Multi level and dynamic voltage scaling pose a greater challenge. For
each supply voltage level or operating point constraints are
specified. There can be different operating modes for different
voltages. Constraints need not be same for all modes and voltages. The
performance target for each mode can vary. EDA tool should be capable
of handling all these situations simultaneously to carry out timing
analysis. Different constraints at different modes and voltages have
to be satisfied.
Related Articles
Multiple Voltage ASIC/SoC Designs: Classification
Timing paths
The sub threshold current always flows from source to drain even if the gate to source
voltage is lesser than the threshold voltage of the device. This happens due to the carrier
diffusion between the source and drain regions of the CMOS transistor in weak inversion.
When gate to source voltage is smaller than but very close to threshold voltage of the
device then sub threshold current becomes significant.
As observed by [4] currently, sub threshold leakage is still playing the main part in the
three mechanisms. However, researchers believe that gate leakage and reverse-biased
junction Band To Band Tunneling (BTBT) will be as important as sub threshold from 45
nm process downwards. In addition, with technology scaling, the gate oxide thickness
will be reduced and the substrate doping densities will be increased. As a result other
factors such as gate-induced drain leakage (GIDL) and drain-induced barrier lowering
(DIBL) will also become more and more evident. Therefore, future effective low leakage
design will need to target at several components since all of them play an important role
in the total leakage consumption. Various techniques at process and circuit level exist to
reduce leakage consumption, including modifying doping profile, oxide thickness and
channel length. Forward or inverse body biasing is also one of them, which is a technique
resulting in variable threshold CMOS.
Sub threshold current Isub, which occurs when gate voltage is below threshold voltage
Vth, is a main part of leakage current [2]. Isub depends on different effects and voltages,
which are formulated in following equations [1]:
Where
T is the temperature,
μ is the mobility,
εox and εSi are the gate dielectric constants of gate oxide and silicium,
Where
k’ is a technology constant,
Transfer characteristics of MOSFET for VGS near Vth are shown in below figure.
Transfer characteristics of MOSFET VGS near Vth [2]
From the above figure it can be observed that ID increases exponentially with reduction in
Vth.
As noted by [4] key dependencies of the sub threshold slope can be summarized as
follows:
A increase in the threshold voltage of the device keeps the Vgs of the NMOS transistor
safely below the Vt,n. This is the case for logic zero input. For the logic one input
increase in the threshold voltage of the device keeps the |Vgs| of the PMOS transistor
safely below the |Vt,p|.
References
[1] Anantha P. Chandrakasan, Samuel Sheng and Robert W.Broadersen, “Low Power
CMOS Digital Design”, IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 472-484,
April 1992
[3] Frank Sill, Frank Grassert and Dirk Timmermann, “Reducing Leakage with Mixed-
Vth (MVT)”, 18th International Conference on VLSI Design, IEEE, pp.874-877, January
2005
[4] Wei Liu ,Techniques for Leakage Power Reduction in Nanoscale Circuits: A Survey1,
Department of Informatics and Mathematical Modeling ,Technical University of
Denmark , IMM Technical Report 2007
• 1) Chip utilization depends on ___.
• 4) Delay between shortest path and longest path in the clock is called ____.
a. Useful skew
b. Local skew
c. Global skew
d. Slack
a. Clock nets
b. Signal nets
c. IO nets
d. PG nets
a. Minimum IR Drop
b. Minimum EM
c. Minimum Skew
d. Minimum Slack
a. Before Placement
b. After Placement
c. Before CTS
d. After CTS
• 10) To achieve better timing ____ cells are placed in the critical path.
a. HVT
b. LVT
c. RVT
d. SVT
a. Frequency
b. Load Capacitance
c. Supply voltage
d. Threshold Voltage
a. Reducing IR Drop
b. Reducing DRC
c. Reducing EM violations
d. None
• 14) Maximum current density of a metal is available in ___.
a. .lib
b. .v
c. .tf
d. .sdc
• 16) The minimum height and width a cell can occupy in the design is called
as ___.
a. Max delay is used for launch path and Min delay for capture path
b. Min delay is used for launch path and Max delay for capture path
c. Both Max delay is used for launch and Capture path
d. Both Min delay is used for both Capture and Launch paths
• 19) "Total metal area and(or) perimeter of conducting layer / gate to gate
area" is called ___.
a. Utilization
b. Aspect Ratio
c. OCV
d. Antenna Ratio
• 21) To avoid cross talk, the shielded net is usually connected to ___.
a. VDD
b. VSS
c. Both VDD and VSS
d. Clock
• 22) If the data is faster than the clock in Reg to Reg path ___ violation may
come.
a. Setup
b. Hold
c. Both
d. None
a. Before placement
b. After placement
c. Before CTS
d. After CTS
a. Max tran
b. Max cap
c. Max fanout
d. Max current density
• 26) Which of the following is having highest priority at final stage (post
routed) of the design ___?
a. Setup violation
b. Hold violation
c. Skew
d. None
a. CLKBUF
b. BUF
c. INV
d. CLKINV
• 28) Max voltage drop will be there at(with out macros) ___.
a. Min width
b. Min spacing
c. Min width - min spacing
d. Min width + min spacing
a. Floorplaning
b. Placement
c. Design Synthesis
d. CTS
• 33) In technology file if 7 metals are there then which metals you will use for
power?
• 34) If metal6 and metal7 are used for the power in 7 metal layer process
design then which metals you will use for clock ?
• 35) In a reg to reg timing path Tclocktoq delay is 0.5ns and TCombo delay is
5ns and Tsetup is 0.5ns then the clock period should be ___.
a. 1ns
b. 3ns
c. 5ns
d. 6ns
• 38) What is the effect of high drive strength buffer when added in long net ?
• 40) After the final routing the violations in the design ___.
a. Constant
b. Decrease
c. Increase
d. None of the above
a. Power routing
b. Signal routing
c. Power and Signal routing
d. None of the above.
a. Clock buffer
b. Clock Inverter
c. AOI cell
d. None of the above
• Answers:
1)b
2)c
3)b
4)c
5)b
6)d
7)a
8)c
9)d
10)b
11)d
12)d
13)b
14)c
15)b
16)a
17)c
18)a
19)d
20)a
21)b
22)b
23)d
24)d
25)c
26)b
27)a
28)c
29)d
30)c
31)d
32)c
33)d
34)c
35)d
36)c
37)a
38)c
39)b
40)d
41)c
42)a
43)a
44)c
You might also like:
• Physical Design Questions and Answers
• 3-D chip design strategy
• Vertical Interconnect Technologies (3-D ICs)
• Digital design Interview Questions
saud said...
According to me ...before CTS there is ideal clock and no real clock is present. if
real clock is not present we dont know the skew and hence cannot fix hold
accurately...
you are welcome to Correct me if i am wrong..
Murali said...
hi saud,
rgds
murali
hi saud
hold violation fix after CTS it is call colck propagate mode
Grigor said...
Murali said...
problem with inverter is it shifts the logic level.... and hence to get back original
logic you have to use one more inverter which will ultimately increase area.
Grigor said...
Hi murali,
Regarding increased area you are right if we have only one stage clock tree.
Generally the same drive strength Inverter contains less transistors than buffer. So
if we have 2 logically equivalent clock trees which has more than 3 stages (which
is the case in most designs) the area is smaller with inverter tree rather than in
buffer.
It depends on design which one is preferable.
It is arguable question, but INVERTER clock tree has more advantages (less area,
small skew, small insertion delay, small duty cycle distortion) than buffer tree.
Anonymous said...
Please explain the answer to 18) OCV timing for setup time
muju said...
Anonymous said...
I feel the answer is Standard cells, macros and pads as pad area also plays
important role in chip utilisation.
Correct me if i am wrong.
Gk said...
Anonymous said...
Hi
i need to get answer for the difference between hvt and lvt cells construction
• What parameters (or aspects) differentiate Chip Design & Block level
design??
• How do you place macros in a full chip design?
• Differentiate between a Hierarchical Design and flat design?
• Which is more complicated when u have a 48 MHz and 500 MHz clock
design?
• Name few tools which you used for physical verification?
• What are the input files will you give for primetime correlation?
• What are the algorithms used while routing? Will it optimize wire length?
• How will you decide the Pin location in block level design?
• If the routing congestion exists between two macros, then what will you
do?
• How will you place the macros?
• How will you decide the die size?
• If lengthy metal layer is connected to diffusion and poly, then which one
will affect by antenna problem?
• If the full chip design is routed by 7 layer metal, why macros are designed
using 5LM instead of using 7LM?
• In your project what is die size, number of metal layers, technology,
foundry, number of clocks?
• How many macros in your design?
• What is each macro size and no. of standard cell count?
• How did u handle the Clock in your design?
• What are the Input needs for your design?
• What is SDC constraint file contains?
• How did you do power planning?
• How to find total chip power?
• How to calculate core ring width, macro ring width and strap or trunk
width?
• How to find number of power pad and IO power pads?
• What are the problems faced related to timing?
• How did u resolve the setup and hold problem?
• If in your design 10000 and more numbers of problems come, then what
you will do?
• In which layer do you prefer for clock routing and why?
• If in your design has reset pin, then it’ll affect input pin or output pin or
both?
• During power analysis, if you are facing IR drop problem, then how did u
avoid?
• Define antenna problem and how did u resolve these problem?
• How delays vary with different PVT conditions? Show the graph.
• Explain the flow of physical design and inputs and outputs for each step in
flow.
• What is cell delay and net delay?
• What are delay models and what is the difference between them?
• What is wire load model?
• What does SDC constraints has?
• Why higher metal layers are preferred for Vdd and Vss?
• What is logic optimization and give some methods of logic optimization.
• What is the significance of negative slack?
• What is signal integrity? How it affects Timing?
• What is IR drop? How to avoid .how it affects timing?
• What is EM and it effects?
• What is floor plan and power plan?
• What are types of routing?
• What is a grid .why we need and different types of grids?
• What is core and how u will decide w/h ratio for core?
• What is effective utilization and chip utilization?
• What is latency? Give the types?
• How the width of metal and number of straps calculated for power and
ground?
• What is negative slack ? How it affects timing?
• What is track assignment?
• What is grided and gridless routing?
• What is a macro and standard cell?
• What is congestion?
• Whether congestion is related to placement or routing?
• What are clock trees?
• What are clock tree types?
• Which layer is used for clock routing and why?
• What is cloning and buffering?
• What are placement blockages?
• How slow and fast transition at inputs effect timing for gates?
• What is antenna effect?
• What are DFM issues?
• What is .lib, LEF, DEF, .tf?
• What is the difference between synthesis and simulation?
• What is metal density, metal slotting rule?
• What is OPC, PSM?
• Why clock is not synthesized in DC?
• What are high-Vt and low-Vt cells?
• What corner cells contains?
• What is the difference between core filler cells and metal fillers?
• How to decide number of pads in chip level design?
• What is tie-high and tie-low cells and where it is used
• What is LEF?
• What is DEF?
• What are the steps involved in designing an optimal pad ring?
• What are the steps that you have done in the design flow?
• What are the issues in floor plan?
• How can you estimate area of block?
• How much aspect ratio should be kept (or have you kept) and what is the
utilization?
• How to calculate core ring and stripe widths?
• What if hot spot found in some area of block? How you tackle this?
• After adding stripes also if you have hot spot what to do?
• What is threshold voltage? How it affect timing?
• What is content of lib, lef, sdc?
• What is meant my 9 track, 12 track standard cells?
• What is scan chain? What if scan chain not detached and reordered? Is it
compulsory?
• What is setup and hold? Why there are ? What if setup and hold violates?
• In a circuit, for reg to reg path ...Tclktoq is 50 ps, Tcombo 50ps, Tsetup 50ps,
tskew is 100ps. Then what is the maximum operating frequency?
• How R and C values are affecting time?
• How ohm (R), fared (C) is related to second (T)?
• What is transition? What if transition time is more?
• What is difference between normal buffer and clock buffer?
• What is antenna effect? How it is avoided?
• What is ESD?
• What is cross talk? How can you avoid?
• How double spacing will avoid cross talk?
• What is difference between HFN synthesis and CTS?
• What is hold problem? How can you avoid it?
• For an iteration we have 0.5ns of insertion delay and 0.1 skew and for other
iteration 0.29ns insertion delay and 0.25 skew for the same circuit then which one
you will select? Why?
• What is partial floor plan?
2 comments:
Alexander said...
some of the Answers to these questions can be found at the below mentioned
location:
http://www.vlsichipdesign.com/asic_vlsi_faq/faq_page1.html
Anil said...
Hi,
Post a Comment
Create a Link
powered
by
Popular Posts
Blog Archive
• ► 2010 (2)
o ► July 2010 (1)
My 3 Day Experience With Synopsys
Lynx Design Syst...
o ► June 2010 (1)
Low Power Techniques -
Presentation
• ► 2009 (14)
o ► September 2009 (1)
Setup Time and Hold Time-Story of
Poor Flip-Flop !...
o ► August 2009 (1)
MULTIPLEXING 7 SEGMENT
DISPLAY USING PIC
MICROCONT...
o ► June 2009 (2)
Free download: OpenSPARC 64 bit
processor and Nang...
Timing paths
o ► May 2009 (3)
IMPLEMENTATION OF
CHEBYSHEV TYPE –1(ORDER-2)
BANDP...
IMPLEMENTATION OF II-ORDER
CHEBYSHEV TYPE-I LOWPAS...
Read more.. SRAM Chip Supporting Circuit
Design
• DSP (22) o ► April 2009 (2)
• Low Power CoreConnect Bus and AMBA Bus
Techniques (16) Specification Resourc...
• Verification (16) System on Chip article links
• MATLAB (15) o ► February 2009 (1)
• Timing Analysis (14) BUTTERWORTH LOWPASS
• ASIC (12) (order-1) FILTER
• Static Timing IMPLEMENTATIO...
Analysis (STA) (11) o ► January 2009 (4)
• DSP filters (10) PIC Microcontrollers for Digital
• FPGA (10) Filter Implementa...
• Physical Design (10) Digital Filter Implementation Using
• Digital design (9) MATLAB
Physical Design Questions and Answers
What parameters (or aspects) differentiate Chip Design and Block level design?
• Chip design uses all metal layes available; block design may not use all metal
layers.
• First check flylines i.e. check net connections from macro to macro and macro to
standard cells.
• If there is more connection from macro to macro place those macros nearer to
each other preferably nearer to core boundaries.
• If input pin is connected to macro better to place nearer to that pin or pad.
• If macro has more connection to standard cells spread the macros inside core.
• Hierarchical design takes more run time; Flattened design takes less run time.
Which is more complicated when u have a 48 MHz and 500 MHz clock design?
• 500 MHz; because it is more constrained (i.e.lesser clock period) than 48 MHz
design.
What are the input files will you give for primetime correlation?
If the routing congestion exists between two macros, then what will you do?
• By checking the total area of the design you can decide die size.
If lengthy metal layer is connected to diffusion and poly, then which one will affect
by antenna problem?
• Poly
If the full chip design is routed by 7 layer metal, why macros are designed using
5LM instead of using 7LM?
• Because top two metal layers are required for global routing in chip design. If top
metal layers are also used in block level it will create routing blockage.
In your project what is die size, number of metal layers, technology, foundry,
number of clocks?
• Die size: tell in mm eg. 1mm x 1mm ; remeber 1mm=1000micron which is a big
size !!
• Foundry:Again look into tech files; eg. TSMC, IBM, ARTISAN etc
• You know it well as you have designed it ! A SoC (System On Chip) design may
have 100 macros also !!!!
• For Physical design: Netlist, Technology library, Constraints, Standard cell library
• Clock definitions
• Timing exception-multicycle path, false path
• Get the total core power consumption; get the metal layer current density value
from the tech file; Divide total power by number sides of the chip; Divide the
obtained value from the current density to get core power ring width. Then
calculate number of straps using some more equations. Will be explained in detail
later.
• Next lower layer to the top two metal layers(global routing layers). Because it has
less resistance hence less RC delay.
If in your design has reset pin, then it’ll affect input pin or output pin or both?
• Output pin.
During power analysis, if you are facing IR drop problem, then how did you avoid?
Define antenna problem and how did you resolve these problem?
• Increased net length can accumulate more charges while manufacturing of the
device due to ionisation process. If this net is connected to gate of the MOSFET it
can damage dielectric property of the gate and gate may conduct causing damage
to the MOSFET. This is antenna problem.
• Decrease the length of the net by providing more vias and layer jumping.
How delays vary with different PVT conditions? Show the graph.
• P increase->dealy increase
• P decrease->delay decrease
• V increase->delay decrease
• V decrease->delay increase
• T increase->delay increase
• T decrease->delay decrease
Explain the flow of physical design and inputs and outputs for each step in flow.
• Gate delay
• Transistors within a gate take a finite time to switch. This means that a change on
the input of a gate takes a finite time to cause a change on the output.[Magma]
• Cell delay
• For any gate it is measured between 50% of input transition to the corresponding
50% of output transition.
• Intrinsic delay
• Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin
of the cell.
• It is defined as the delay between an input and output pair of a cell, when a near
zero slew is applied to the input pin and the output does not see any load
condition.It is predominantly caused by the internal capacitance associated with
its transistor.
• This delay is largely independent of the size of the transistors forming the gate
because increasing size of transistors increase internal capacitors.
• Net Delay (or wire delay)
• The difference between the time a signal is first applied to the net and the time it
reaches other devices connected to that net.
• It is due to the finite resistance and capacitance of the net.It is also known as wire
delay.
What are delay models and what is the difference between them?
• Wire load model is NLDM which has estimated R and C of the net.
Why higher metal layers are preferred for Vdd and Vss?
• Upsizing
• Downsizing
• Buffer insertion
• Buffer relocation
• Dummy buffer placement
• IR drop, Electro Migration (EM), Crosstalk, Ground bounce are signal integrity
issues.
• There is a resistance associated with each metal layer. This resistance consumes
power causing voltage drop i.e.IR drop.
• Due to high current flow in the metal atoms of the metal can displaced from its
origial place. When it happens in larger amount the metal can open or bulging of
metal layer can happen. This effect is known as Electro Migration.
• Global Routing
• Track Assignment
• Detail Routing
What is latency? Give the types?
• Source Latency
• It is known as source latency also. It is defined as "the delay from the clock origin
point to the clock definition point in the design".
• Delay from clock source to beginning of clock tree (i.e. clock definition point).
• The time a clock signal takes to propagate from its ideal waveform origin point to
the clock definition point in the design.
• Network latency
• The time clock signal (rise or fall) takes to propagate from the clock definition
point to a register clock pin.
• Second stage of the routing wherein particular metal tracks (or layers) are
assigned to the signal nets.
What is congestion?
• If the number of routing tracks available for routing is less than the required
tracks then it is known as congestion.
• Distribution of clock from the clock source to the sync pin of the registers.
17 comments:
Anil said...
Anil said...
Hi, Thank you for making a blog with fabulous questoin and answers in back
end...
I have a doubt with reference to the question "calculating the power ring width".
From tech file how do we get the maximun metal density of a layer?
Where is it available???
Murali said...
Hi cvn,
Sorry for the typing mistake...You are absolutely right ... 48 and 500 numbers
wrongly exchanged...let me correct that !
rgds
murali
Anil said...
Hi Murali,
Thank you very much for your nice clarification. I have some more doubts with
reference to the question "Define antenna problem and how did you resolve these
problem?", Can we insert a buffer (to divide the lengthy metal into two)to resolve
antenna proble.
I mean when we insert a buffer we are inserting silicon (along with a little metal).
so it can also resolve the problem.
Murali said...
Hi anil,
First preference is to metal layer jumping.
If antenna problem is in lower layer jump to higher layer and again come back.
Last option,as you said insert buffer. But when you do that higher metal layers has
to come to lower metal layer (M1 or M2) to connect to pins of buffer and go
back.And also there may not enough place for buffer insertion. (Remember after
routing we go for antenna check). This may lead to congestion and DRC
voilation.
In P&R tool you have all these options to fix antenna problem.
rgds
murali
Anil said...
Hi Murali,
While calculating the power consumption, we add up standar cell power, macro
power and pad power. How do we know power consumption of all these?
rgds,
Anil
Murali said...
Please refer:
http://asic-soc.blogspot.com/2007/10/power-planning.html
rgds
murali
savita said...
hi can any one help me understanding STA with example if you have any material
pls send it to [email protected] it would be great help
thanku
savita
Murali said...
let me try....!
Anonymous said...
Hi, Thanks for this nice material, looking forward for more interesting and deep
analysis of different stages of pnr.
rgds,
Amulya.
padmavathi said...
can u give the details how to find die area if i know total area from dc
compiler.how to estimate die size.can u elaborate on this
Murali said...
Total cell area is obtained from the area report from DC. Take squareroot of this.
Obtained value is the approaximate height and width of the core area.
Total area report provides the area considering pads also. Hence you can estimate
what is tha extra area required for the pad.
Thus you can estimate die size. Remember that this is just an estimate. Actual die
size can vary.
rgds
murali
padma.p said...
Thank u for ur reply.In dc we dont know how much area for net routing.
u given in example of floor plan using SAMM(systolic array matrix multiplier)
floor plan .can u explain on what bases u estimated that.
Murali said...
Since over the cell routing is very common in all EDA tools we need not worry
about area required for nets.
Required Inputs:
Calculations:
Total standard cell area = no. of standard cells * one standard cell area
(Alternatively this can be directly obtained from the DC area report).
Core size = Standard cell area / Utilization (Assuming there are no hard macros; If
there are then add this also )
= X um * Y um.
Die area = [Core width + PG ring width + core offset + 2 * pad height ] *
[Core height + PG ring width + core offset + 2 * pad height ]
= A um * B um
=AB um2
muju said...
f the full chip design is routed by 7 layer metal, why macros are designed using
5LM instead of using 7LM?
* Because top two metal layers are required for global routing in chip design. If
top metal layers are also used in block level it will create routing blockage.
Reply me
Mujtaba Ahmed
K.K. said...
Routing blockage's are used to prevent metal layers get routed in particular chip
area.
Mantu said...
Can some one explain me wht is the difference between set_input_delay and
set_driving_cell in DC?
Placement
Before the start of placement optimization all Wire Load Models (WLM) are removed.
Placement uses RC values from Virtual Route (VR) to calculate timing. VR is the
shortest Manhattan distance between two pins. VR RCs are more accurate than WLM
RCs.
1. Ire-placement optimization
2. In placement optimization
3. Post Placement Optimization (PPO) before clock tree synthesis (CTS)
4. PPO after CTS.
In-placement optimization re-optimizes the logic based on VR. This can perform cell
sizing, cell moving, cell bypassing, net splitting, gate duplication, buffer insertion, area
recovery. Optimization performs iteration of setup fixing, incremental timing and
congestion driven placement.
Post placement optimization before CTS performs netlist optimization with ideal
clocks. It can fix setup, hold, max trans/cap violations. It can do placement optimization
based on global routing. It re does HFN synthesis.
Post placement optimization after CTS optimizes timing with propagated clock. It tries
to preserve clock skew.
Reference
[1] Astro User Guide, Version X-2005.09, September 2005
Related Articles
• Physical Design Flow
• Libraries
• Inputs–outputs from physical design process
• Floor Planning
• Power Planning
• Timing Analysis in Physical Design
• Clock Tree Synthesis (CTS)
• Routing
Timing analysis at back end requires knowledge of all clock related constraints provided
at front end. When .sdc file given to physical design tool (like Astro) its first object is to
remove all Wire Load Models (WLM) which are used for front end timing analysis. In
backend there is no term called as wire load model. Actual delays are calculated based on
the RC value of metal layers. All RC values like sidewall, junction and fringe
capacitances are stored as Table Look Up (TLU) format in technology file.
In backend design hold violation has higher priority compared to setup violation because
hold violation is related to data path of the design. Setup violation can be eliminated by
slowing down the clock.
Placement and routing goal is always to meet timing constraints provided by the .sdc file.
If latency and uncertainty are not set for clock at front end then at backend doing Clock
Tree Synthesis (CTS) is not possible.
Cell delay consists of transition, timing arcs and capacitances while net delay is
constituted by RCs only. Cell delays are available in libraries
. Net delays are specified in technology files. (In front end it is in WLM). Cell delays are
fixed. Net delays are not fixed and they depend on interconnect length and width. Net
delay parameters Rnet and Cnet are available as Table Look Up (TLU) provided by the
vendor.
There is one more set of file TLU+ which account for Ultra Deep Sub Micron (UDSM)
effects. UDSM effects are not included in TLU file. A mapping file maps TLU to TLU+.
UDSM effects like Optical Proximity Correction (OPC), Resumption Enhanced
Technology (RET) and Litho Compliance Check (LCC) are not taken care by Astro.
For the placement stage virtual RC (based on Manhattan distance) Layout Parasitic
Extraction (LPE) mode is used. For CTS real R and virtual C is used and for routing
Real RC is used.
Clock definition given to SAMM in front end design flow is generated as .sdc file from
Design Compiler is given below. It includes clock frequency, rise and fall time, setup and
hold, skew and insertion delay.
#####################################################
# Created by Design Compiler write_sdc on Fri May 11 18:35:45 2007
#####################################################
create_clock -period 4.85 -waveform {0 2.425} [get_ports {clock}]
set_clock_transition -rise 0.04 [get_clocks {clock}]
set_clock_transition -fall 0.04 [get_clocks {clock}]
set_clock_uncertainty 0.485 -setup [get_clocks {clock}]
set_clock_uncertainty 0.27 -hold [get_clocks {clock}]
set_clock_latency 0.45 [get_clocks {clock}]
set_clock_latency -source 0.45 [get_clocks {clock}]
The unit tile height of lvt cells is 2.52 µ and hvt cells are 1.96 µ. Hence two separate unit
tiles have to be created and should be added in the technology file. Hvt reference library
is created with the unit tile name “unit” and lvt reference library is created with unit tile
name “lvt_unit”. By default “unit” tile is defined in technology file and the other unit tile
“lvt_unit” is also added to the technology file.
Figure 2. Tile height specifications in library preparation
Floor Planning
70% of the core utilization is provided. Aspect ratio is kept at 1. Rows are flipped, double
backed and made channel less. No Top Design Format (TDF) file is selected as default
placement of the IO pins are considered. Since we have multi height cells in the reference
library separate placement rows have to be provided for two different unit tiles. The core
area is divided into two separate unit tile section providing larger area for Hvt unit tile as
shown in the Figure 3.
Figure 3. Different unit tile placement
First as per the default floor planning flow rows are constructed with unit tile. Later rows
are deleted from the part of the core area and new rows are inserted with the tile
“lvt_unit”. Improper allotment of area can give rise to congestion. Some iteration of trial
and error experiments were conducted to find best suitable area for two different unit
tiles. The “unit” tile covers 44.36% of core area while “lvt_unit” 65.53% of the core area.
PR summary report of the design after the floor planning stage is provided below.
PR Summary:
Chip Utilization:
PR Summary:
[Tile Utilization]
============================================================
============================================================
But this method of placement generates unacceptable congestion around the junction area
of two separate unit tile sections. The congestion map is shown in Figure 4.
Figure 4. Congestion
There are two congestion maps. One is related to the floor planning with aspect ratio 1
and core utilization of 70%. This shows horizontal congestion over the limited value of
one all over the core area meaning that design can’t be routed at all. Hence core area has
to be increased by specifying height and width. The other congestion map is generated
with the floor plan wherein core area is set to 950 µm. Here we can observe although
congestion has reduced over the core area it is still a concern over the area wherein two
different unit tiles merge as marked by the circle. But design can be routable and can be
carried to next stages of place and route flow provided timing is met in subsequent
implementation steps.
Tighter timing constraints and more interrelated connections of standard cells around the
junction area of different unit tiles have lead to more congestion. It is observed that
increasing the area isn't a solution to congestion. In addition to congestion, situation
verses with the timing optimization effort by the tool. Timing target is not able to meet.
Optimization process inserts several buffers around the junction area and some of them
are placed illegally due to the lack of placement area.
Timing/Optimization Information:
[TIMING]
Type Slack Num Total Target Slack Num Trans MaxCap Time
========================================================
========================================================
Since the timing is not possible to meet design has to be abandoned from subsequent
steps. Hence in a multi vt design flow cell library with multi heights are not preferred.
References
• Below are the sequence of questions asked for a physical design engineer.
• Answer to this question depends on your interest, expertise and to the requirement
for which you have been interviewed.
If you have both IR drop and congestion how will you fix it?
• -Spread macros
• -Spread standard cells
• -Increase strap width
• -Increase number of straps
• -Use proper blockage
Is increasing power line width and providing more number of straps are the only
solution to IR drop?
• -Spread macros
• -Spread standard cells
• -Use proper blockage
In a reg to reg path if you have setup problem where will you insert buffer-near to
launching flop or capture flop? Why?
• (buffers are inserted for fixing fanout voilations and hence they reduce setup
voilation; otherwise we try to fix setup voilation with the sizing of cells; now just
assume that you must insert buffer !)
• Because there may be other paths passing through or originating from the flop
nearer to lauch flop. Hence buffer insertion may affect other paths also. It may
improve all those paths or degarde. If all those paths have voilation then you may
insert buffer nearer to launch flop provided it improves slack.
• -If it is from seperate clock sources (i.e.asynchronous; from different pads or pins)
then balancing skew between these clock sources becomes challenging.
• Switching of the signal in one net can interfere neigbouring net due to cross
coupling capacitance.This affect is known as cros talk. Cross talk may lead setup
or hold voilation.
• -High frequency noise (or glitch)is coupled to VSS (or VDD) since shilded layers
are connected to either VDD or VSS.
• Why clock?-- because it is the one signal which chages it state regularly and more
compared to any other signal. If any other signal switches fast then also we can
use double space.
• Buffer increase victims signal strength; buffers break the net length=>victims are
more tolerant to coupled signal from aggressor.
• What is the difference between hard macro, firm macro and soft macro?
or
• Hard macro, firm macro and soft macro are all known as IP (Intellectual
property). They are optimized for power, area and performance. They can be
purchased and used in your ASIC or FPGA design implementation flow. Soft
macro is flexible for all type of ASIC implementation. Hard macro can be used in
pure ASIC design flow, not in FPGA flow. Before bying any IP it is very
important to evaluate its advantages and disadvantages over each other, hardware
compatibility such as I/O standards with your design blocks, reusability for other
designs.
Soft macros
• Soft macros are in synthesizable RTL.
• Soft macros carry greater IP protection risks because RTL source code is more
portable and therefore, less easily protected than either a netlist or physical layout
data.
• From the physical design perspective, soft macro is any cell that has been placed
and routed in a placement and routing tool such as Astro. (This is the definition
given in Astro Rail user manual !)
• Soft macros are editable and can contain standard cells, hard macros, or other soft
macros.
Firm macros
• Firm macros are in netlist format.
• Firm macros are more flexible and portable than hard macros.
• Firm macros are predictive of performance and area than soft macros.
Hard macro
• Hard macros are generally in the form of hardware IPs (or we termed it as
hardwre IPs !).
• Hard macros are block level designs which are silicon tested and proved.
• You have freedom to move, rotate, flip but you can't touch anything inside hard
macros.
• Very common example of hard macro is memory. It can be any design which
carries dedicated single functionality (in general).. for example it can be a MP4
decoder.
• Be aware of features and characteristics of hard macro before you use it in your
design... other than power, timing and area you also should know pin properties
like sync pin, I/O standards etc
• LEF, GDS2 file format allows easy usage of macros in different tools.
• Hard macro is a block that is generated in a methodology other than place and
route (i.e. using full custom design methodology) and is brought into the physical
design database (eg. Milkyway in Synopsys; Volcano in Magma) as a GDS2 file.
• Here is one article published in embedded magazine about IPs. Click here to read.
Synthesis and placement of macros in modern SoC designs are challenging. EDA tools
employ different algorithms accomplish this task along with the target of power and area.
There are several research papers available on these subjects. Some of them can be
downloaded from the given link below.
• "Hard Macro Placement in Complex SoC Design" - view and read article from
soccentral