DQ42
DQ42
DQ42
INRODUCTION
Multipliers are one of the most significant blocks in computer arithmetic and are generally
used in different digital signal processors. There is growing demands for high speed multipliers
in different applications of computing systems, such as computer graphics, scientific calculation,
image processing and so on. Speed of multiplier determines how fast the processors will run and
designers are now more focused on high speed with low power consumption. The multiplier
architecture consists of a partial product generation stage, partial product reduction stage and the
final addition stage. The partial product reduction stage is responsible for a significant portion of
the total multiplication delay, power and area. Therefore in order to accumulate partial products,
compressors usually implement this stage because they contribute to the reduction of the partial
products and also contribute to reduce the critical path which is important to maintain the
circuit’s performance.
This is accomplished by the use of 3-2, 4-2, 5-2 compressor structures. A 3-2 compressor
circuit is also known as full adder cell. As these compressors are used repeatedly in larger
systems, so improved design will contribute a lot towards overall system performance. The
internal structure of compressors are basically composed of XOR-XNOR gates and multiplexers.
The XOR-XNOR circuits are also building blocks in various circuits like arithmetic circuits,
multipliers, compressors, parity checkers, etc. Optimized design of these XOR-XNOR gates can
improve the performance of multiplier circuit. In present work, a new XOR-XNOR module has
been proposed and 4-2 compressor has been implemented using this module. Use proposed
circuit in partial product accumulation reduces transistor count as well as power consumption.
Addition and multiplication are widely used operations in computer arithmetic; for addition
full-adder cells have been extensively analyzed for approximate computing Liang et al. has
compared these adders and proposed several new metrics for evaluating approximate and
probabilistic adders with respect to unified figures of merit for design assessment for inexact
computing applications. For each input to a circuit, the error distance (ED) is defined as the
arithmetic distance between an erroneous output and the correct one. The mean error distance
(MED) and normalized error distance (NED) are proposed by considering the averaging effect of
multiple inputs and the normalization of multiple-bit adders. The NED is nearly invariant with
the size of an implementation and is therefore useful in the reliability assessment of a specific
design. The tradeoff between precision and power has also been quantitatively evaluated.
However, the design of approximate multipliers has received less attention. Multiplication
can be thought as the repeated sum of partial products; however, the straightforward application
of approximate adders when designing an approximate multiplier is not viable, because it would
be very inefficient in terms of precision, hardware complexity and other performance metrics.
Several approximate multipliers have been proposed. Most of these designs use a truncated
multiplication method; they estimate the least significant columns of the partial products as a
constant. An imprecise array multiplier is used for neural network applications by omitting some
of the least significant bits in the partial products (and thus removing some adders in the array).
A truncated multiplier with a correction constant is proposed.
2. COMPRESSOR
Braun Multiplier :-
Variables with bars denotes prior inversions. Inverters are connected before the input of the full
adder or the AND gates as required by the algorithm. Each column represents the addition in
accordance with the respective weight of the product term.
3:2 Compressor:
Multipliers are essential component in different circuits, particularly in arithmetic
operation such as compressors, parity checkers and comparators. Multipliers consist of three
fundamental parts: a partial product generator, a partial product reduction and a final fast adder
part. A booth encoder is used to generate the partial products and partial product are reduces to
two rows using compressor circuits. Finally, fast adder is used to sum the two rows.
The partial product reduction part of multiplier contributes maximum power
consumption, delay and layout area. Various high speed multipliers use 3-2, 4-2 and 5-2
compressors to lower the latency of partial product reduction part. These compressors are used to
minimize delay and area which leads to increase the performance of the overall system.
Compressors are generally designed by XOR-XNOR gates and multiplexers. A compressor is a
device which is used to reduce the operands while adding terms of partial products in multipliers.
An X-Y compressor takes X equally weighted input bits and produces Y-bit binary number. The
most widely and the simplest used compressor is the 3-2 compressor which is also known as a
full adder. A 3-2 compressor has three inputs X1, X2, X3 and generates two outputs, the sum and
the carry bits. The block diagram of 3-2 compressor is shown in figure.
Fig: 3:2 Compressor
A 3-2 compressor cell can be implemented in many different logic structures. However,
in general, it is composed by three main modules. The first module is required to generate XOR
or XNOR function, or both of them. The second module is used to generate sum and the last
module is to produce carry output. The 3-2 compressor can also be implemented as full adder
cell when its third input is taken as the Carry input from the preceding compressor block or X3 =
Cin. The basic equation of 3-2 compressor is:
X1+X2+X3=Sum+2*carry.............................(1)
Fig: Conventional 3-2 compressor
The conventional architectures of 3-2 compressor shown in figure 2(a), it has two XOR
gates in the critical path. The sum output is generated by the second XOR and carry output is
generated by the multiplexer (MUX). The equations governing the conventional 3-2 compressor
outputs are shown below:
The 3-2 compressor architecture shown in figure 2(b) has less delay as compared to other
architectures, as some of the XOR circuits are replaced by the multiplexer circuits. In this
compressor the select bit at multiplexer present before the input arrives, so reduces the delay.
Thus the switching time of the transistors in the critical path is decreased. This minimizes the
delay to a significant amount. This architecture shows critical path delay of Δ-XOR +Δ-MUX.
Output functions of modified 3-2 compressor circuit are shown by the equations as given:
3. PROPOSED MODEL
The overall structures and the details of the suggested dual-quality approximate compressors are
described.
A. Exact 4:2 Compressor
To reduce the delay of the partial product summation stage of parallel multipliers, 4:2 and
5:2 compressors are widely employed. Some compressor structures, which have been optimized
for one or more design parameters (e.g., delay, area, or power consumption), have been
proposed. The focus of this paper is on approximate 4:2 compressors. First, some background on
the exact 4:2 compressor is presented. This type of compressor, shown schematically in Fig. 1,
has four inputs (x1–x4) along with an input carry (Cin), and two outputs (sum and carry) along
with an output Cout. The internal structure of an exact 4:2 compressor is composed of two
serially connected full adders.
In this structure, the weights of all the inputs and the sum output are the same whereas the
weights of the carry and Cout outputs are one binary bit position higher. The outputs sum, carry,
and Cout are obtained from
sum = x1⊕x2⊕x3⊕x4⊕Cin (1)
carry = (x1⊕x2⊕x3⊕x4)Cin + (x1⊕x2⊕x3⊕x4)x4 (2)
Cout = (x1⊕x2)x3+(x1 ⊕ x2)x1. (3)
Fig. 3. Block diagram of the proposed approximate 4:2 compressors. The hachured box in
the approximate part indicates the components, which are not shared between this and
supplementary parts.
We use the power gating technique to turn OFF the unused components of the
approximate part. Also note that, as is evident from Fig. 3, in the exact operating mode, tristate
buffers are utilized to disconnect the outputs of the approximate part from the primary outputs. In
this design, the switching between the approximate and exact operating modes is fast. Thus, it
provides us with the opportunity of designing parallel multipliers that are capable of switching
between different accuracy levels during the runtime. Next, we discuss the details of our four
DQ4:2Cs based on the diagram shown in Fig. 3. The structures have different accuracies, delays,
power consumptions, and area usages. Note that the i th proposed structure is denoted by
DQ4:2Ci . The basic idea behind suggesting the approximate compressors was to minimize the
difference (error) between the outputs of exact and approximate ones. Therefore, in order to
choose the proper approximate designs for the compressors, an extensive search was performed.
During the search, we used the truth table of the exact 4:2 compressor as the reference.
1) Structure 1 (DQ4:2C1): For the approximate part of the first proposed DQ4:2C structure, as
shown in Fig. 4(a), the approximate output carry (i.e., carry_) is directly connected to the input
x4 (carry_ = x4), and also, in a similar approach, the approximate output sum (i.e., sum_) is
directly connected to Fig. 5. (a) Approximate part and (b) overall structure of DQ4:2C2. input x1
(sum_ = x1). In the approximate part of this structure, the output Cout is ignored. While the
approximate part of this structure is considerably fast and low power, its error rate is large
(62.5%).
The supplementary part of this structure is an exact 4:2 compressor. The overall structure
of the proposed structure is shown in Fig. 4(b). In the exact operating mode, the delay of this
structure is about the same as that of the exact 4:2 compressor.
Fig. 6. (a) Approximate part of DQ4:2C3 and (b) overall structure of DQ4:2C3.
3) Structure 3 (DQ4:2C3): The previous structures, in the approximate operating mode, had
maximum power and delay reductions compared with those of the exact compressor. In some
applications, however, a higher accuracy may be needed. In the third structure, the accuracy of
the approximate operating mode is improved by increasing the complexity of the approximate
part whose internal structure is shown in Fig. 6(a). In this structure, the accuracy of output sum_
is increased. Similar to DQ4:2C1, the approximate part of this structure does not support output
Cout. The error rate of this structure, however, is reduced to 50%.
The overall structure of DQ4:2C3 is shown in Fig. 6(b) where the supplementary part is
enclosed in a red dashed line rectangle. Note that in this structure, the utilized NAND gate of the
approximate part (denoted by a blue dotted line rectangle) is not used during the exact operating
mode. Hence, during this operating mode, we suggest disconnecting supply voltage of this gate
by using the power gating.
Fig. 7. (a) Approximate part of DQ4:2C4 and (b) overall structure of DQ4:2C4.
4) Structure 4 (DQ4:2C4): In this structure, we improve the accuracy of the output carry_
compared with that of DQ4:2C3 at the cost of larger delay and power consumption where the
error rate is reduced to 31.25%. The internal structure of the approximate part and the overall
structure of DQ4:2C4 are shown in Fig. 7. The supplementary part is indicated by red dashed
line rectangular while the gates of the approximate part, powered OFF during the exact operating
mode, are indicated by the blue dotted line.
Note that the error rate corresponds to the occurrence of the errors in the output for the
complete range of the input.
The output quality is determined by the error distance (ED) parameter, which is the
difference between the exact output and the output of the approximate unit. In addition to the
ED, there are other closely related parameters, namely, normalized ED (NED) and mean relative
ED (MRED), which are more important in determining the output quality.
MULTIPLIER REALIZED BY THE PROPOSED COMPRESSORS
In this section, first, the accuracy metrics considered in this paper are introduced. Next,
the accuracy of 8-, 16-, and 32-bit Fig. 8. Reduction circuitry of an 8-bit Dadda mutiplier. Dadda
multipliers realized by the proposed compressors is studied. A proper combination of the
proposed compressors may be utilized to achieve a better tradeoff between the accuracy and
design parameters. As an option, the use of both DQ4:2C1 and DQ4:2C4 for the LSB and MSB
parts in the multiplication, respectively, is suggested here.
The results for this multiplier are denoted by DQ4:2Cmixed. These multipliers are
compared by the approximate Dadda multipliers implemented by two prior proposed
approximate 4:2 compressors discussed as well as the configurable multiplier suggested in. In
addition, some stateof- the-art approximate multiplier designs, which do not use approximate
compressors, are considered. These multipliers include 32-bit unsigned ROBA (U-ROBA), SSM
with a segment size 8 (SSM8), and DRUM with a segment size 6 (DRUM6). The general
structure of the reduction circuitry in an 8-bit Dadda multiplier, which makes use of 4:2
compressors, is drawn in Fig. 8.
combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s
when complex semiconductor and communication technologies were being developed. The
microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have
5.1 Overview:
The first semiconductor chips held one transistor each. Subsequent advances added more
and more transistors, and, as a consequence, more individual functions or systems were
integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten
diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic
improvements in technique led to devices with hundreds of logic gates, known as large-scale
integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved
far past this mark and today's microprocessors have many millions of gates and hundreds of
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge
number of gates and transistors available on common devices has rendered such fine distinctions
moot.
Terms suggesting greater than VLSI levels of integration are no longer in
widespread use. Even VLSI is now somewhat quaint, given the common assumption that all
example of which is Intel's Montecito Itanium chip. This is expected to become more
processes to the next 45 nm generations (while experiencing new challenges such as increased
variation across process corners). Another notable example is NVIDIA’s 280 series GPU.
This microprocessor is unique in the fact that its 1.4 Billion transistor count,
count is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices,
use extensive design automation and automated logic synthesis to lay out the transistors,
enabling higher levels of complexity in the resulting logic functionality. Certain high-
performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the
highest efficiency (sometimes by bending or breaking established design rules to obtain the last
VLSI stands for "Very Large Scale Integration". This is the field which involves packing
more and more logic devices into smaller and smaller areas.
VLSI
semiconductor material
Integrated circuit (IC) may contain millions of transistors, each a few mm in size
determine the architecture of the entire system. Integrated circuits improve system characteristics
in several critical ways. ICs have three key advantages over digital circuits built from discrete
components:
Size. Integrated circuits are much smaller-both transistors and wires are shrunk to
components. Small size leads to advantages in speed and power consumption, since
Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip
than they can between chips. Communication within a chip can occur hundreds of times
faster than communication between chips on a printed circuit board. The high speed of
circuits on-chip is due to their small size-smaller components and wires have smaller
Power consumption. Logic operations within a chip also take much less power. Once
again, lower power consumption is largely due to the small size of circuits on the chip-
smaller parasitic capacitances and resistances require less power to drive them.
5.5 VLSI and systems:
These advantages of integrated circuits translate into advantages at the system level:
Lower power consumption. Replacing a handful of standard parts with a single chip
reduces total power consumption. Reducing power consumption has a ripple effect on the
rest of the system: a smaller, cheaper power supply can be used; since less power
consumption means less heat, a fan may no longer be necessary; a simpler cabinet with
Reduced cost. Reducing the number of components, the power supply requirements,
cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of
integration is such that the cost of a system built from custom ICs can be less, even
though the individual ICs cost more than the standard parts they replace.
Understanding why integrated circuit technology has such profound influence on the design
of digital systems requires understanding both the technology of IC manufacturing and the
Applications
Etc….
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or
by other means; electronics are usually smaller, more flexible, and easier to service. In other
cases electronic systems have created totally new applications. Electronic systems perform a
Electronic systems in cars operate stereo systems and displays; they also
perform the control functions required for anti-lock braking (ABS) systems.
analysis, and games. Computers include both central processing units (CPUs)
and special-purpose hardware for disk access, faster screen display, etc.
Medical electronic systems measure bodily functions and perform complex
manufacturing of integrated circuits and electronic systems to new levels of complexity. And
perhaps the most amazing characteristic of this collection of systems is its variety-as systems
become more complex, we build not a few general-purpose computers but an ever wider range of
integrated circuit manufacturing and design, but the increasing demands of customers continue to
5.7 ASIC:
customized for a particular use, rather than intended for general-purpose use. For example, a chip
designed solely to run a cell phone is an ASIC. Intermediate between ASICs and industry
standard integrated circuits, like the 7400 or the 4000 series, are application specific standard
products (ASSPs).
As feature sizes have shrunk and design tools improved over the years, the maximum
complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over
100 million. Modern ASICs often include entire 32-bit processors, memory blocks including
ROM, RAM, EEPROM, Flash and other large building blocks. Such an ASIC is often termed a
SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL),
breadboard or prototype from standard parts; programmable logic blocks and programmable
interconnects allow the same FPGA to be used in many different applications. For smaller
designs and/or lower production volumes, FPGAs may be more cost effective than an ASIC
Structured ASIC’s are used mainly for mid-volume level design. The design task for
structured ASIC’s is to map the circuit into a fixed arrangement of known cells.
6. INTRODUCTION TO XILINX
When you open a project file from a previous release, the ISE® software prompts you to
migrate your project. If you click Backup and Migrate or Migrate Only, the software
automatically converts your project file to the current release. If you click Cancel, the software
does not convert your project and, instead, opens Project Navigator with no project loaded.
Note: After you convert your project, you cannot open it in previous versions of the ISE
software, such as the ISE 11 software. However, you can optionally create a backup of the
To Migrate a Project
2. In the Open Project dialog box, select the .xise file to migrate.
Note You may need to change the extension in the Files of type field to display .npl
(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.
3. In the dialog box that appears, select Backup and Migrate or Migrate Only.
Note If you chose to Backup and Migrate, a backup of the original project is created at
project_name_ise12migration.zip.
6.2 Properties:
For information on properties that have changed in the ISE 12 software, see ISE 11 to
6.3 IP Modules:
If your design includes IP modules that were created using CORE Generator™ software
or Xilinx® Platform Studio (XPS) and you need to modify these modules, you may be required
to update the core. However, if the core netlist is present and you do not need to modify the
core, updates are not required and the existing netlist is used during implementation.
The ISE 12 software supports all of the source types that were supported in the ISE 11
software.
If you are working with projects from previous releases, state diagram source files (.dia),
ABEL source files (.abl), and test bench waveform source files (.tbw) are no longer supported.
For state diagram and ABEL source files, the software finds an associated HDL file and adds it
to the project, if possible. For test bench waveform files, the software automatically converts the
TBW file to an HDL test bench and adds it to the project. To convert a TBW file after project
To help familiarize you with the ISE® software and with FPGA and CPLD designs, a set
of example designs is provided with Project Navigator. The examples show different design
techniques and source types, such as VHDL, Verilog, schematic, or EDIF, and include different
To Open an Example
2. In the Open Example dialog box, select the Sample Project Name.
Note To help you choose an example project, the Project Description field describes
each project. In addition, you can scroll to the right to see additional fields, which
directory.
4. Click OK.
The example project is extracted to the directory you specified in the Destination
Directory field and is automatically opened in Project Navigator. You can then run processes
Note If you modified an example project and want to overwrite it with the original
example project, select File > Open Example, select the Sample Project Name, and specify the
same Destination Directory you originally used. In the dialog box that appears, select Overwrite
Project Navigator allows you to manage your FPGA and CPLD designs using an ISE®
project, which contains all the source files and settings specific to your design. First, you must
create a project and then, add source files, and set process properties. After you create a project,
you can run processes to implement, constrain, and analyze your design. Project Navigator
Note If you prefer, you can create a project using the New Project dialog box instead of
the New Project Wizard. To use the New Project dialog box, deselect the Use New Project
wizard option in the ISE General page of the Preferences dialog box.
To Create a Project
1. Select File > New Project to launch the New Project Wizard.
2. In the Create New Project page, set the name, location, and project type, and
click Next.
3. For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,
select the input and constraint file for the project, and click Next.
4. In the Project Settings page, set the device and project properties, and click
Next.
5. In the Project Summary page, review the information, and click Finish to
Project Navigator creates the project file (project_name.xise) in the directory you
specified. After you add source files to the project, the files appear in the Hierarchy pane of the
Project Navigator manages your project based on the design properties (top-level module
type, device type, synthesis tool, and language) you selected when you created the project. It
organizes all the parts of your design and keeps track of the processes necessary to move the
design from design entry through implementation to programming the targeted Xilinx® device.
Note For information on changing design properties, see Changing Design Properties.
You can create a copy of a project to experiment with different source options and
implementations. Depending on your needs, the design source files for the copied project and
Design source files are left in their existing location, and the copied project
Design source files, including generated files, are copied and placed in a
specified directory.
Design source files, excluding generated files, are copied and placed in a
specified directory.
Copied projects are the same as other projects in both form and function. For example, you can
Use the Project Browser to view key summary data for the copied project and
then, open the copied project for further analysis and implementation, as described in
Alternatively, you can create an archive of your project, which puts all of the project
contents into a ZIP file. Archived projects must be unzipped before being opened in Project
2. In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as long as you
By default, this is blank, and the working directory is the same as the project directory.
However, you can specify a working directory if you want to keep your ISE® project
The description can be useful in identifying key traits of the project for reference later.
existing location.
If you select this option, the copied project points to the files in their existing location. If
you edit the files in the copied project, the changes also appear in the original project, because
Copy sources to the new location - to make a copy of all the design source files and
If you select this option, the copied project points to the files in the specified directory. If you
edit the files in the copied project, the changes do not appear in the original project, because the
Optionally, select Copy files from Macro Search Path directories to copy files from
the directories you specify in the Macro Search Path property in the Translate Properties dialog
box. All files from the specified directories are copied, not just the files used by the design.
Note: If you added a net list source file directly to the project as described in Working
with Net list-Based IP, the file is automatically copied as part of Copy Project because it is a
project source file. Adding net list source files to the project is the preferred method for
incorporating net list modules into your design, because the files are managed automatically by
Project Navigator.
Optionally, click Copy Additional Files to copy files that were not included in the
original project. In the Copy Additional Files dialog box, use the Add Files and Remove Files
buttons to update the list of additional files to copy. Additional files are copied to the copied
project location after all other files are copied.To exclude generated files from the copy, such as
When you select this option, the copied project opens in a state in which processes have
7. To automatically open the copy after creating it, select Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original
Click OK.
A project archive is a single, compressed ZIP file with a .zip extension. By default, it
contains all project files, source files, and generated files, including the following:
Remote sources
Generated files
Non-project files
2. In the Project Archive dialog box, specify a file name and directory for the ZIP
file.
A ZIP file is created in the specified directory. To open the archived project, you must
first unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a remote_sources
subdirectory in the project archive. When the archive is unzipped and opened, you must either
specify the location of these files in the remote_sources subdirectory for the unzipped project, or
7. INTRODUCTION TO VERILOG
Overview
The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Verilog is case-
sensitive, has a basic preprocessor (though less sophisticated than that of ANSI C/C++), and
equivalent control flow keywords (if/else, for, while, case, etc.), and compatible operator
precedence. Syntactic differences include variable declaration (Verilog requires bit-widths on
net/reg types[clarification needed]), demarcation of procedural blocks (begin/end instead of curly braces
{}), and many other minor differences.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling of shared
signal lines, where multiple sources drive a common net. When a wire has multiple drivers, the
wire's (readable) value is resolved by a function of the source drivers and their strengths.
Beginning
Verilog was the first modern hardware description language to be invented. It was created
by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The wording for this process
was "Automated Integrated Design Systems" (later renamed to Gateway Design Automation in
1985) as a hardware modeling language. Gateway Design Automation was purchased
by Cadence Design Systems in 1990. Cadence now has full proprietary rights to Gateway's
Verilog and the Verilog-XL, the HDL-simulator that would become the de-facto standard (of
Verilog logic simulators) for the next decade. Originally, Verilog was intended to describe and
allow simulation; only afterwards was support for synthesis added.
Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain under
the Open Verilog International (OVI) (now known as Accellera) organization. Verilog was later
submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards
support behind its analog simulator Spectre. Verilog-A was never intended to be a standalone
language and is a subset of Verilog-AMS which encompassed Verilog-95.
Verilog 2001
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that
users had found in the original Verilog standard. These extensions became IEEE Standard 1364-
2001 known as Verilog-2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for
(2's complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a simple 8-
bit addition required an explicit description of the Boolean algebra to determine its correct
value). The same function under Verilog-2001 can be more succinctly described by one of the
built-in operators: +, -, /, *, >>>. A generate/endgenerate construct (similar to VHDL's
generate/endgenerate) allows Verilog-2001 to control instance and statement instantiation
through normal decision operators (case/if/else). Using generate/endgenerate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual instances.
File I/O has been improved by several new system tasks. And finally, a few syntax additions
were introduced to improve code readability (e.g. always @*, named parameter override, C-style
function/task/module header declaration).
Verilog 2005
Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005) consists of
minor corrections, spec clarifications, and a few new language features (such as the uwire
keyword).
A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and mixed
signal modeling with traditional Verilog.
SystemVerilog
SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to aid
design verification and design modeling. As of 2009, the SystemVerilog and Verilog language
standards were merged into SystemVerilog 2009 (IEEE Standard 1800-2009).
In the late 1990s, the Verilog Hardware Description Language (HDL) became the most
widely used language for describing hardware for simulation and synthesis. However, the first
two versions standardized by the IEEE (1364-1995 and 1364-2001) had only simple constructs
for creating tests. As design sizes outgrew the verification capabilities of the language,
commercial Hardware Verification Languages (HVL) such as Open Vera and e were created.
Companies that did not want to pay for these tools instead spent hundreds of man-years creating
their own custom tools. This productivity crisis (along with a similar one on the design side) led
to the creation of Accellera, a consortium of EDA companies and users who wanted to create the
next generation of Verilog. The donation of the Open-Vera language formed the basis for the
HVL features of SystemVerilog.Accellera’s goal was met in November 2005 with the adoption
of the IEEE standard P1800-2005 for SystemVerilog, IEEE (2005).
The most valuable benefit of SystemVerilog is that it allows the user to construct reliable,
repeatable verification environments, in a consistent syntax, that can be used across multiple
projects
Some of the typical features of an HVL that distinguish it from a Hardware Description
Language such as Verilog or VHDL are
Constrained-random stimulus generation
Functional coverage
Higher-level structures, especially Object Oriented Programming
Multi-threading and interprocess communication
Support for HDL types such as Verilog’s 4-state values
Tight integration with event-simulator for control of the design
There are many other useful features, but these allow you to create test benches at a
higher level of abstraction than you are able to achieve with an HDL or a programming language
such as C.
System Verilog provides the best framework to achieve coverage-driven verification (CDV).
CDV combines automatic test generation, self-checking testbenches, and coverage metrics to
significantly reduce the time spent verifying a design. The purpose of CDV is to:
Receive early error notifications and deploy run-time checking and error analysis to
simplify debugging.
Examples
module toplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
The other assignment operator, "=", is referred to as a blocking assignment. When "="
assignment is used, for the purposes of logic, the target variable is updated immediately. In the
above example, had the statements used the "=" blocking operator instead of "<=", flop1 and
flop2 would not have been swapped. Instead, as in traditional programming, the compiler would
understand to simply set flop1 equal to flop2 (and subsequently ignore the redundant logic to set
flop2 equal to flop1.)
parameter size = 5;
parameter length = 20;
endmodule
Ex4: An example of delays:
...
reg a, b, c, d;
wire e;
...
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e. the always clause
executes any time any of the entities in the list change, i.e. the b or e change. When one of these
changes, immediately a is assigned a new value, and due to the blocking assignment b is
assigned a new value afterward (taking into account the new value of a.) After a delay of 5 time
units, c is assigned the value of b and the value of c ^ e is tucked away in an invisible store. Then
after 6 more time units, d is assigned the value that was tucked away.
Signals that are driven from within a process (an initial or always block) must be of type reg.
Signals that are driven from outside a process must be of type wire. The keyword reg does not
necessarily imply a hardware register.
7.3 Constants
The definition of constants in Verilog supports the addition of a width parameter. The basic
syntax is:
Examples:
There are several statements in Verilog that have no analog in real hardware, e.g.
$display. Consequently, much of the language can not be used to describe hardware. The
examples presented here are the classic subset of the language that has a direct mapping to real
gates.
The next interesting structure is a transparent latch; it will pass the input to the output
when the gate signal is set for "pass-through", and captures the input and stores it upon transition
of the gate signal to "hold". The output will remain stable regardless of the input signal while the
gate is set to "hold". In the example below the "pass-through" level of the gate would be when
the value of the if clause is true, i.e. gate = 1. This is read "if gate is true, the din is fed to
latch_out continuously." Once the if clause is false, the last value at latch_out will remain and is
independent of the value of din.
EX6: // Transparent latch example
reg out;
always @(gate or din)
if(gate)
out = din; // Pass through state
// Note that the else isn't required here. The variable
// out will follow the value of din while gate is high.
// When gate goes low, out will remain constant.
The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it can be
modeled as:
reg q;
always @(posedge clk)
q <= d;
The significant thing to notice in the example is the use of the non-blocking assignment.
A basic rule of thumb is to use <= when there is a posedge or negedge statement within the
always clause.
A variant of the D-flop is one with an asynchronous reset; there is a convention that the
reset state will be the first if clause within the statement.
reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition; again
the convention comes into play, i.e. the reset term is followed by the set term.
reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.
Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set goes
high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume no setup and
hold violations.
In this example the always @ statement would first execute when the rising edge of reset
occurs which would place q to a value of 0. The next time the always block executes would be
the rising edge of clk which again would keep q at a value of 0. The always block then executes
when set goes high which because reset is high forces q to remain at 0. This condition may or
may not be correct depending on the actual flip flop. However, this is not the main problem with
this model. Notice that when reset goes low, that set is still high. In a real flip flop this will cause
the output to go to a 1. However, in this model it will not occur because the always block is
triggered by rising edges of set and reset - not levels. A different approach may be necessary for
set/reset flip flops.
Note that there are no "initial" blocks mentioned in this description. There is a split
between FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks
where reg values are established instead of using a "reset" signal. ASIC synthesis tools don't
support such a statement. The reason is that an FPGA's initial state is something that is
downloaded into the memory tables of the FPGA. An ASIC is an actual hardware
implementation.
There are two separate ways of declaring a Verilog process. These are the always and
the initial keywords. The always keyword indicates a free-running process. The initial keyword
indicates a process executes exactly once. Both constructs begin execution at simulator time 0,
and both execute until the end of the block. Once an always block has reached its end, it is
rescheduled (again). It is a common misconception to believe that an initial block will execute
before an always block. In fact, it is better to think of the initial-block as a special-case of
the always-block, one which terminates after it completes for the first time.
//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
b = a; // Assign the value of reg a to reg b
end
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant additional
uses. The most common of these is an alwayskeyword without the @(...) sensitivity list. It is
possible to use always as shown below:
always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will
execute forever.
The other interesting exception is the use of the initial keyword with the addition of
the forever keyword.
The order of execution isn't always guaranteed within Verilog. This can best be
illustrated by a classic example. Consider the code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
$display("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of execution of the
initial blocks, it could be zero and zero, or alternately zero and some other arbitrary uninitialized
value. The $display statement will always execute after both assignment blocks have completed,
due to the #1 delay.
7.7 Operators
Bitwise Bitwise OR
|
^ Bitwise XOR
~^ or ^~ Bitwise XNOR
! NOT
Logical AND
&&
|| OR
| Reduction OR
~| Reduction NOR
^ Reduction XOR
~^ or ^~ Reduction XNOR
+ Addition
- Subtraction
- 2's complement
Arithmetic
* Multiplication
/ Division
** Exponentiation (*Verilog-2001)
Concatenation { , } Concatenation
Conditional ?: Conditional
System tasks are available to handle simple I/O, and various design measurement functions. All
system tasks are prefixed with $ to distinguish them from user tasks and functions. This section
presents a short list of the most often used tasks. It is by no means a comprehensive list.