Backend (Physical Design) Interview Questions and Answers
Backend (Physical Design) Interview Questions and Answers
Backend (Physical Design) Interview Questions and Answers
Manjunathagowda G C What parameters (or aspects) differentiate Chip Design and Block level design? Chip design has I/O pads; block design has pins. Chip design uses all metal layes available; block design may not use all metal layers. Chip is generally rectangular in shape; blocks can be rectangular, rectilinear. Chip design requires several packaging; block design ends in a macro. How do you place macros in a full chip design? First check flylines i.e. check net connections from macro to macro and macro to standard cells. If there is more connection from macro to macro place those macros nearer to each other preferably nearer to core boundaries. If input pin is connected to macro better to place nearer to that pin or pad. If macro has more connection to standard cells spread the macros inside core. Avoid criscross placement of macros. Use soft or hard blockages to guide placement engine. Differentiate between a Hierarchical Design and flat design? Hierarchial design has blocks, subblocks in an hierarchy; Flattened design has no subblocks and it has only leaf cells. Hierarchical design takes more run time; Flattened design takes less run time. Which is more complicated when u have a 48 MHz and 500 MHz clock design? 500 MHz; because it is more constrained (i.e.lesser clock period) than 48 MHz design. Name few tools which you used for physical verification? Herculis from Synopsys, Caliber from Mentor Graphics. What are the input files will you give for primetime correlation? Netlist, Technology library, Constraints, SPEF or SDF file.
If the routing congestion exists between two macros, then what will you do? Provide soft or hard blockage How will you decide the die size? By checking the total area of the design you can decide die size. If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna problem? Poly If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of using 7LM? Because top two metal layers are required for global routing in chip design. If top metal layers are also used in block level it will create routing blockage. In your project what is die size, number of metal layers, technology, foundry, number of clocks? Die size: tell in mm eg. 1mm x 1mm ; remeber 1mm=1000micron which is a big size !! Metal layers: See your tech file. generally for 90nm it is 7 to 9.
Technology: Again look into tech files. Foundry:Again look into tech files; eg. TSMC, IBM, ARTISAN etc Clocks: Look into your design and SDC file !
How many macros in your design? You know it well as you have designed it ! A SoC (System On Chip) design may have 100 macros also !!!! What is each macro size and number of standard cell count? Depends on your design. What are the input needs for your design? For synthesis: RTL, Technology library, Standard cell library, Constraints For Physical design: Netlist, Technology library, Constraints, Standard cell library What is SDC constraint file contains? Clock definitions Timing exception-multicycle path, false path Input and Output delays How did you do power planning? How to calculate core ring width, macro ring width and strap or trunk width? How to find number of power pad and IO power pads? How the width of metal and number of straps calculated for power and ground? Get the total core power consumption; get the metal layer current density value from the tech file; Divide total power by number sides of the chip; Divide the obtained value from the current density to get core power ring width. Then calculate number of straps using some more equations. Will be explained in detail later. How to find total chip power? Total chip power=standard cell power consumption,Macro power consumption pad power consumption. What are the problems faced related to timing? Prelayout: Setup, Max transition, max capacitance Post layout: Hold How did you resolve the setup and hold problem? Setup: upsize the cells Hold: insert buffers In which layer do you prefer for clock routing and why? Next lower layer to the top two metal layers(global routing layers). Because it has less resistance hence less RC delay. If in your design has reset pin, then itll affect input pin or output pin or both? Output pin. During power analysis, if you are facing IR drop problem, then how did you avoid? Increase power metal layer width. Go for higher metal layer. Spread macros or standard cells. Provide more straps.
Define antenna problem and how did you resolve these problem? Increased net length can accumulate more charges while manufacturing of the device due to ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric property of the gate and gate may conduct causing damage to the MOSFET. This is antenna problem. Decrease the length of the net by providing more vias and layer jumping. Insert antenna diode. How delays vary with different PVT conditions? Show the graph. P increase->dealy increase P decrease->delay decrease
Gate delay Transistors within a gate take a finite time to switch. This means that a change on the input of a gate takes a finite time to cause a change on the output.[Magma]
Gate delay =function of(i/p transition time, Cnet+Cpin). Cell delay is also same as Gate delay. Cell delay
For any gate it is measured between 50% of input transition to the corresponding 50% of output transition.
Intrinsic delay Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the
cell.
It is defined as the delay between an input and output pair of a cell, when a near zero slew is applied to the input pin and the output does not see any load condition.It is predominantly caused by the internal capacitance associated with its transistor.
This delay is largely independent of the size of the transistors forming the gate because increasing size of transistors increase internal capacitors. Net Delay (or wire delay) The difference between the time a signal is first applied to the net and the time it reaches other devices connected to that net.
It is due to the finite resistance and capacitance of the net.It is also known as wire delay. Wire delay =fn(Rnet , Cnet+Cpin)
Explain the flow of physical design and inputs and outputs for each step in flow.
What are delay models and what is the difference between them? Linear Delay Model (LDM) Non Linear Delay Model (NLDM) What is wire load model? Wire load model is NLDM which has estimated R and C of the net. Why higher metal layers are preferred for Vdd and Vss? Because it has less resistance and hence leads to less IR drop. What is logic optimization and give some methods of logic optimization. Upsizing Downsizing Buffer insertion Buffer relocation Dummy buffer placement What is the significance of negative slack? negative slack==> there is setup voilation==> deisgn can fail What is signal integrity? How it affects Timing? IR drop, Electro Migration (EM), Crosstalk, Ground bounce are signal integrity issues. If Idrop is more==>delay increases. crosstalk==>there can be setup as well as hold voilation. What is IR drop? How to avoid? How it affects timing? There is a resistance associated with each metal layer. This resistance consumes power causing voltage drop i.e.IR drop. If IR drop is more==>delay increases. What is EM and it effects? Due to high current flow in the metal atoms of the metal can displaced from its origial place. When it happens in larger amount the metal can open or bulging of metal layer can happen. This effect is known as Electro Migration.
What are types of routing? Global Routing Track Assignment Detail Routing What is latency? Give the types? Source Latency It is known as source latency also. It is defined as "the delay from the clock origin point to the clock definition point in the design".
Delay from clock source to beginning of clock tree (i.e. clock definition point).
The time a clock signal takes to propagate from its ideal waveform origin point to the clock definition point in the design.
Network latency
It is also known as Insertion delay or Network latency. It is defined as "the delay from the clock definition point to the clock pin of the register".
The time clock signal (rise or fall) takes to propagate from the clock definition point to a register clock pin. What is track assignment? Second stage of the routing wherein particular metal tracks (or layers) are assigned to the signal nets. What is congestion? If the number of routing tracks available for routing is less than the required tracks then it is known as congestion. Whether congestion is related to placement or routing? Routing What are clock trees? Distribution of clock from the clock source to the sync pin of the registers. What are clock tree types? H tree, Balanced tree, X tree, Clustering tree, Fish bone
Low power design Can you talk about low power techniques? How low power and latest 90nm/65nm technologies are related? Michael Keating et al. [1] lists several low power techniques to tackle the dynamic and static power consumption in modern SoC designs. Dynamic power control techniques include clock gating, multi voltage, variable frequency, and efficient circuits. Leakage power control techniques include power gating, multi Vt cells. Common methods supported by EDA tools include clock gating, gate sizing, low power placement, register clustering, low power CTS, multi Vt optimization. Some of the low power techniques in use today are listed in below table
Above table summarizes trade-offs associated with different power management techniques. Power gating and DVFS demand large methodology change whereas multi vt and clock gating affect least. Unless large leakage optimization is not necessary it is always beneficial to go with either multi vt or clock gating techniques. Based on the design complexity and requirements combination of any low power techniques can be adopted. Multi vt optimization along with the power gating is found to be efficient in some of the complex designs. Advanced improvements in the implementation (i.e. fabrication) technology has allowed substrate biasing techniques to be used heavily as it does not pose any architectural and design verification challenges and also provides high leakage Do you know about input vector controlled method of leakage reduction?
Leakage current of a gate is dependant on its inputs also. Hence find the set of inputs which gives least leakage. By applyig this minimum leakage vector to a circuit it is possible to decrease the leakage current of the circuit when it is in the standby mode. This method is known as input vector controlled method of leakage reduction.
How can you reduce dynamic power? -Reduce switching activity by designing good RTL -Clock gating -Architectural improvements -Reduce supply voltage -Use multiple voltage domains-Multi vdd What are the vectors of dynamic power?
Power Planning
There are two types of power planning and management. They are core cell power management and I/O cell power management. In former one VDD and VSS power rings are formed around the core and macro. In addition to this straps and trunks are created for macros as per the power requirement. In the later one, power rings are formed for I/O cells and trunks are constructed between core power ring and power pads. Top to bottom approach is used for the power analysis of flatten design while bottom up approach is suitable for macros. The power information can be obtained from the front end design. The synthesis tool reports static power information. Dynamic power can be calculated using Value Change Dump (VCD) or Switching Activity Interchange Format (SAIF) file in conjunction with RTL description and test bench. Exhaustive test coverage is required for efficient calculation of peak power. This methodology is depicted in Figure (1). For the hierarchical design budgeting has to be carried out in front end. Power is calculated from each block of the design. Astro works on flattened netlist. Hence here top to bottom approach can be used. JupiterXT can work on hierarchical designs. Hence bottom up approach for power analysis can be used with JupiterXT. IR drops are not found in floor planning stage. In placement stage rails are get connected with power rings, straps, trunks. Now IR drops comes into picture and improper design of power can lead to large IR drops and core may not get sufficient power.
Figure (1) Power Planning methodology Below are the calculations for flattened design of the SAMM. Only static power reported by the Synthesis tool (Design Compiler) is used instead of dynamic power.
The number of the core power pad required for each side of the chip
= total core power / [number of side*core voltage*maximum allowable current for a I/O pad] = 236.2068mW/ [4 * 1.08 V * 24mA] (Considering design SAMM) = 2.278 ~ 2 Therefore for each side of the chip 2 power pads (2 VDD and 2 VSS) are added.
Total dynamic core current (mA)
= total dynamic core current / number of sides * Jmax where Jmax is the maximum current density of metal layer used = 218.71 mA / [4 * 49.5 mA/m] = 1.104596 m Hence pad to trunk width is kept as 2m. Using below mentioned equations we can calculate verti vertical cal and horizontal strap width and required number of straps for each macro.
Block current: Current supply from each side of the block:
Iblock= Pblock / Vddcore Itop=Ibottom= { Iblock *[Wblock / (Wblock +Hblock)] }/2 Ileft=Iright= { Iblock *[Hblock / (Wblock +Hblock)] }/2
Power strap width based on EM:
Wstrap_vertical >=[ Itop * Roe * Hblock ] / 0.1 * VDD Wstrap_horizontal >=[ Ileft * Roe * Wblock ] / 0.1 * VDD
Refresh width:
Wrefresh_vertical =3 * routing pitch +minimum width of metal (M4) Wrefresh_horizontal =3 * routing pitch +minimum width of metal (M3)
Refresh number
If you have both IR drop and congestion how will you fix it? -Spread macros -Spread standard cells -Increase strap width -Increase number of straps -Use proper blockage Is increasing power line width and providing more number of straps are the only solution to IR drop? -Spread macros -Spread standard cells -Use proper blockage In a reg to reg path if you have setup problem where will you insert buffer-near to launching flop or capture flop? Why? (buffers are inserted for fixing fanout voilations and hence they reduce setup voilation; otherwise we try to fix setup voilation with the sizing of cells; now just assume that you must insert buffer !) Near to capture path. Because there may be other paths passing through or originating from the flop nearer to lauch flop. Hence buffer insertion may affect other paths also. It may improve all those paths or degarde. If all those paths have voilation then you may insert buffer nearer to launch flop provided it improves slack. How will you decide best floorplan? Refer here for floor planning. What is the most challenging task you handled? What is the most challenging job in P&R flow? -It may be power planning- because you found more IR drop -It may be low power target-because you had more dynamic and leakage power -It may be macro placement-because it had more connection with standard cells or macros -It may be CTS-because you needed to handle multiple clocks and clock domain crossings -It may be timing-because sizing cells in ECO flow is not meeting timing -It may be library preparation-because you found some inconsistancy in libraries. -It may be DRC-because you faced thousands of voilations How will you synthesize clock tree? -Single clock-normal synthesis and optimization -Multiple clocks-Synthesis each clock seperately -Multiple clocks with domain crossing-Synthesis each clock seperately and balance the skew How many clocks were there in this project? -It is specific to your project -More the clocks more challenging ! How did you handle all those clocks? -Multiple clocks-->synthesize seperately-->balance the skew-->optimize the clock tree
Are they come from seperate external resources or PLL? -If it is from seperate clock sources (i.e.asynchronous; from different pads or pins) then balancing skew between these clock sources becomes challenging. -If it is from PLL (i.e.synchronous) then skew balancing is comparatively easy. Why buffers are used in clock tree? To balance skew (i.e. flop to flop delay) What is cross talk? Switching of the signal in one net can interfere neigbouring net due to cross coupling capacitance.This affect is known as cros talk. Cross talk may lead setup or hold voilation. How can you avoid cross talk? -Double spacing=>more spacing=>less capacitance=>less cross talk -Multiple vias=>less resistance=>less RC delay -Shielding=> constant cross coupling capacitance =>known value of crosstalk -Buffer insertion=>boost the victim strength How shielding avoids crosstalk problem? What exactly happens there? -High frequency noise (or glitch)is coupled to VSS (or VDD) since shilded layers are connected to either VDD or VSS. Coupling capacitance remains constant with VDD or VSS. How spacing helps in reducing crosstalk noise? width is more=>more spacing between two conductors=>cross coupling capacitance is less=>less cross talk Why double spacing and multiple vias are used related to clock? Why clock?-- because it is the one signal which chages it state regularly and more compared to any other signal. If any other signal switches fast then also we can use double space. Double spacing=>width is more=>capacitance is less=>less cross talk Multiple vias=>resistance in parellel=>less resistance=>less RC delay
How buffer can be used in victim to avoid crosstalk? Buffer increase victims signal strength; buffers break the net length=>victims are more tolerant to coupled signal from aggressor.