Methods For Electronic System Design and Verification Lab Report DAT110

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Methods for electronic system design and verification

Lab Report
DAT110
Hakim Male

Chalmers University of Technology

ALU Design and verification function but only one bit was toggled and the rest were
To follow the modern EDA flow strategy (1, p. 21), and domant. I tried to find several ways around the issue so
being that reportedly, verification engineers rely heavily that i could achieve the 100% needed coverage, but it
on simulation at the cycle level, I started the lab with seemed impossible to do at that time. The same issue
the design and verification of the arithmetic logical unit proped up during the Sklansky design and verification as
(ALU) using VHDL as the prefered hardware description shown in the figure below.
language, and incisive (NCSIM) suite of tools from Ca-
dence to verify its functionality.

Fig. 1: Block diagram Fig. 4: Resulting IMC coverage from both the RCA and
Sklanskys SLT SLTU opcodes

Both functional and logical simulations were performed


on both designs where by the former simulated the design
description to verify its logical correctness while the latter
was used primarily on the RTL models of the digital
Fig. 2: Timing diagram
portions of the respective designs. Having testbenches
Fig. 3: Summary diagrams of functional timing blocks for the respective design was truly important to verify
that the desired results were actualized. However, most
First, a better vision was developed on how the system of the time, NCSIMs SimVision™ was employed to
modules were to be built and how they will interact have a visual interpretation of when actual timing events
with each other as shown in figures [3] above. Since occured, in relation to the test vectors provided.
the system was driven by a clock, with every rising
edge-triggering action within the computational logic, a Synthesis and timing analysis
timing diagram was also realized as in figure [2]. After In my quest to perform synthesis, the ultimate goal was
verifying the functionality through using a testbench, i to create the best balance of power, performance and
had to use coverage metrics to get rid of variables and area (PPA) for my aforementioned design proposals. As
functions that were of un-perticipative in the whole code stated in (2, p. 5), my synthesis flow was intertwined
functionality. I stambled across a situation during the together with power, timing and independently optimizing
coverage metrics where the functionality for my SLT the netlists which were later on turned to technology gates
calculations was red-flagged. The issue emanated from as shown in figure (2, Fig. 1.1). In this process, i used the
the fact that my ALU produced 32 bits from an SLT Cadence Genus™ synthesis solution to perform a series of
tasks including extracting the complete timing and phys- and floorplanning are intimately close, the two steps were
ical context of my ALU and Sklansky design and derive carried out in parallel to each other. After simulating the
the RTL unit-level synthesis with different iterations of process with the given netlist, ALU RCA.v, i went on
timing constraints and placement. The solution incoperated to try out the process with my generated ALU netlist.v
different options of datapath optimization rounds that con- file. Contrary to the process outlined in the Lab memo, i
currently considered many different datapath architectures instead opted to start with placing the large macro objects
across both of my designs and then leveraged an analytical like Vvdd and Vss grids first as i felt they gave me a
solver that satisfied my end goal optimal PPA. This hugely better grip on control of the whole process. Additionally,
helped me to improve my RTL design efficiency and hence floorplanning having a large impact on the interconnect
led to more aggresive architecure-level optimizations to delay on critical paths (2, p. 337), it took precedence, in
improve my PPA values by performing iterative static my own view. I also made the power rails a bit wider - upto
timing analysis (STA), and twitching it in the testbenches 3.0 µm - as recommended on (2, p. 339) in that hope that
to see if i had any timing violations. The iterations were i would pre-emptively meet electromigration constraints. I
done in reference to the recommendation on (1, p. 22) to then carried on with placement of pins, placing the input
check that there are no problem delay paths. pins on one side of the chip and the outputs on the opposite
side, using Metal 1 (M1) from top to bottom, and Metal
Power analysis & Timing closure 2 (M2) from left to right.
To better understand timing closure, (3) suggests that its
a process where primitive elements such as combinatorial
or sequential logic gates are modified to meet timing Clock tree synthesis (CTS)
requirements. Although todays computers have no explicit
finite delay to perform their calculations, logic circuits In this section, i performed CTS to minimize clock skew
have well defined delays to propagate inputs to outputs and minimun insertion delay. Prior to that, i performed the
as illustrated in (2, Tab. 1.1). From a broader picture, the pre-cts placement optimization whose main purpose was
Sklansky adder had far better timing paths compared to to apply placement, timing and congestion-aware transfor-
the ripple carry adder either constrained or unconstrained. mations to the logic to fix all the critical setup timing paths
One thing though that surprized me was that theoretically, and power problems left over from the placement phase as
the Sklansky adder takes much area than the RCA. If outlined in (2, section. 13.2.6). At first, electrical problems
both were synthesized with similar constraints, it would are corrected, such as maximum input slew violations on
confirm the proposition. However, in my case, i always signal sink pins and maximum load capacitance violations
got Sklansky taking up lesser area. I would assume that on output pins. Gate sizing, buffering of large fanout
it was because the constraines forced the tool optimize networks, and repeater insertion all contribute to these
the area for the Sklansky adder. I expected it to do so problems. Furthermore, the design is timed and optimized
with the RCA, but however, i started running into some using a slack-takedown worst negative slack (WNS) and
setup and hold violations. total negative slack (TNS) approach. I analysed the reports
generated and realized that whenever there were setup
Power consumption is and has always been a significant violations, WNS was the worsk of all negative slacks. TNS
challenge for electronics circuits. A possible solution on the other hand, was the sum of all negative slacks.
is to explore various design options very early in the The procedure was complete when there were no more
design process to address the issue (4). In my case, i resolvable violations existing.
was to compare which design, RCA or Sklansky adder I then went for the real CTS operation whose main
was power-efficient. Due to the fact that i was using a purpose was to implement the clock distribution logic.
low power cell library as per recommended by the Lap As seen from (2, section. 13.2.7), the main issues to
memo, indeed my static amd leakage portions of power deal with in this phase were performance, signal integrity,
were insignificantly negligable. On switching to a high manufacturability, and power supply integrity.
performance library, and reduction in the activity factor After the CTS procedure, clock buffers were visibly
(α) for the inputs, there was a significant reduction in the added, though adding more congestion to the design and
switching power. resultantly, increased non-clock cells, which were then
moved to less ideal locations and introduced a little timing
Furthermore, varying the activity factor α from 0.5 violation. I also realized that clock skew objectives were
to 0.01 led to an increase in leakage power. However, an met, i.e. with a positive slack value. The hold time was
increase to 0.25 reduced power leakage slightly but the however, not met and this was because the data path was
in certain instances, the timing constraints were not met. not routed at that moment. But from an overall timing
In other words, it became slow. This also fell in line with perspective, the resulting timing compared to the previous
the findings that (5) made. procedures was immensly improved.
During the post-cts optimization, whose main purpose
ALU place & route was to clean up the performance problems caused in the
According to the design flow depicted in (2, Fig. 13.9), introduction of clocks (2, section. 13.2.8), and this is
we are now entering the placement phase where we are where holding time violations were addressed. The added
to implement the RTL description of the chip based on accuracy of timing the real clock distribution logic allowed
the concept phase that we have seen above. Because RTL for the corection of all hold time problems.
Routing
As explained extensively in (2, section. 13.2.9) the goal
here, is to add wires to the design. It’s also stated cuation-
sly that in this procedure, all timing problems should be
fixed in the previous sessions, which made me backtrack
a little to make sure that i got all the timing requirements
in good order. Upon completion of the routing procedure,
filler cells - called spare modules - were added to fill the
empty gaps, that appeared on the visual representation of
the final chip. I experimented with adding more decoupling
capacitors, Nand, Or gats among a few.
Verification of the results in their respective categories
resulted in no violations except the geometrical ones. Part
of the explanation was that because we are using a version
of cadence that is not optimized for production, those
errors were simply ineradicable.
Conclusion
I learned a lot from dealing with an EDA tool like
cadence from start, where i had the drawing board on
what to achieve, to end. Most of the challenges faced
along the way are hard to think of if one has no access to
these tools, but surprisingly, they have a lot of analogies
in the real world. It was my understand of what these
analogies were so that i could understand more why i
needed to take the steps i took throughout the whole
exercise.

Indeed, nothing can be done at precise calibrations


in the world of semi-conductor engineering without the
use of an EDA tool and this lab underlined to me what
role they play in order to get the desired outcome.
References
[1] Louis K. Scheffer, Igor L. Markov, Luciano Lavagno,
Grant Martin, Electronic Design automation for IC
System Design, Verification, and Testing, 2nd ed.
CRC Press Taylor Francis Group, vol. 1.
[2] Luciano Lavagno, Grant Martin, Louis K. Scheffer,
Igor L. Markov, Electronic Design automation for IC
implementation, Circuit Design, and Process Technol-
ogy, 2nd ed. CRC Press Taylor Francis Group, vol. 2.
[3] Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin
Hu, VLSI Physical Design: From Graph Partitioning
to Timing Closure, 2nd ed. Springer, 2011.
[4] Y. Nasser, J. Lorandel, J.-C. Prévotet, and M. Hélard,
“Rtl to transistor level power modeling and estima-
tion techniques for fpga and asic: A survey,” IEEE
Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 40, no. 3, pp. 479–493,
2021.
[5] K. Johansson, O. Gustafsson, and L. Wanhammar,
“Power estimation for ripple-carry adders with cor-
related input data,” vol. 3254, 09 2004, pp. 662–674.

You might also like