4 Ijcsi

IJCSI International Journal of Computer Science Issues, Vol.
9, Issue 4, No 1, July 2012

ISSN (Online): 1694-0814
www.IJCSI.org 395
A Proficient Design of Hybrid Synchronous and Asynchronous

Digital FIR Filter using FPGA
Paulchamybalaiah1 and Dr.Ila Vennila2
1
Assistant Professor
Department Of ECE
Hindusthan Institute of Technology
Coimbatore-32
2
Assistant Professor
Department Of EEE
PSG College of Technology
Coimbatore-32
Abstract extract useful parts of the signal, such as the components

In this paper, a hybrid synchronous and asynchronous digital FIR lying within a certain frequency range [3]. Digital filter
filter is designed and implemented in FPGA using VHDL. The uses a digital processor to perform numerical calculation
digital FIR filter of high throughput, low latency operating at on sampled value of the signal. The processor may be a
above 1.3 GHz was designed. An adaptive high capacity
general purpose computer such as PC or a specialized DSP
pipelined was introduced in the hybrid synchronous
asynchronous design of the filter. The degree of the pipelining is
(digital signal processor) chip.
dynamically variable depending upon the input. Concurrent The types of the filter are as follows:
execution of software or program can be achieved in FPGA • Low pass filter: They leave to pass the low
through parallel processing. The designed digital FIR filter is frequencies.
simulated using ModelSim and implemented using Xilinx. The • High pass filter: They leave to pass the high
simulation results are presented for different order such as 3, 6 frequencies and they strongly attenuate the low ones.
and 15. The FIR filter designed is synthesized in Xilinx 9.1i and • Band pass filter: They leave to pass the mean
the device utilization report is presented for filter of order 3, 6 and frequencies and they attenuate the high
15.
Ones and the low ones.
Keywords: FPGA, Asynchronous pipeline, dynamic logic, FIR
Filter
2. Overview of Digital Implementation of FIR
1. Introduction
2.1 Digital Implementation of FIR Filter using DSP
Basically the filters are designed by using finite number of
samples of impulse response which is termed as finite Distributed Arithmetic has been used to implement a bit-
impulse response filters. It is a non- recursive, discrete- serial scheme of a general asymmetric version of an FIR
time filter. The output depends only on present and filter, taking optimal advantage of the 4-input LUT-based
previous inputs. It is to remove unwanted parts of the structure of FPGAs and a highly area-efficient multiplier-
signal such as random noise and also to extract useful parts less FIR filter is designed. To implement DSP functions in
of the signals such as the components lying within a certain Field FPGAs, which offer a balanced solution in
frequency range [1], [2]. FIR filters are inherently stable comparison with traditional devices? Although ASICs and
due to the fact that all the poles are located at the origin DSP chips have been the traditional solution for high
and thus are located within the unit circle. FIR filters performance applications, now the technology and the
require no feedback means that any rounding errors are market are imposing new rules. On one hand, high
compounded by some iteration. They can be designed to development costs and time-to-market factors associated
be linear phase by making the coefficient sequence with ASICs can be prohibitive for certain applications and,
symmetric , linear phase or phase change proportional to on the other hand, programmable DSP processors can be
frequency , corresponds to equal delay at all frequencies. unable to reach a desired performance due to their
sequential-execution architecture. The research Community
In signal processing, the function of a filter is to measure has put great effort in designing efficient architectures for
unwanted parts of the signal such as random noise and to
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
www.IJCSI.org 396
DSP functions such as FIR filters, which are extensively filters of high order. A flexible architecture that gradually
used in multiple applications in telecommunications, replaces LUT requirements with multiplexer/adder pairs
wireless or satellite communications, video and audio was introduced. An asymmetric FIR filter architecture
processing, biomedical signal processing and many others. using the bit-serial LUT-based DA technique is
Traditionally, the design methods were mainly focused in presented. For this implementation, we use a scheme that
multiplier-based architectures to implement the multiply- takes advantage of the 4-input LUTs in FPGAs, and
and- Accumulate (MAC) blocks that constitute the central rearranges the input sequence to implement a modified
piece in FIR filters and several DSP functions: But careful version of the shifter/accumulator stage. We show that our
analysis shows that multiplier-based filter implementations modified version is superior in terms of area to previous
may become highly expensive [2], [4]. LUT-less DA architectures [5], [7].
2.2 Digital Implementation of FIR Filter using FPGA 3. Problem Formulation

FPGAs offer a very attractive solution that balance high In the existing method, fixed order filter is used. The filter
flexibility, time-to-market, cost and performance. This is a ten-tap six bit FIR filter Partial sums are pre-computed
issue has been partially solved with the new generation of and stored in a LUT, indexed by the input data values. The
low- cost FPGAs that have embedded DSP blocks. signed-digit offset binary notation is used in which the
However, if the final product will reside on an ASIC for symbols “0” and “1” stand for negative and positive co-
instance, the problem is still present. To resolve this issue, efficient of powers of 2.The Figure1.Shows that Existing
several multipliers-less schemes were proposed. Basically, Fixed Mode Filter.
these methods can be classified in two categories according
to how they manipulate the filter coefficients for the multiply Odd and Even
partial sum
operation. The first type of multiplier-less technique is the generator Adder Output
Register
conversion-based approach, in which the coefficients are Filter Filter
Input Output
transformed to other numeric representations whose
hardware implementation or manipulation is more efficient Left synchronous portion Asynchronous Right synchronous
Portion portion
than the traditional binary representation. Example of such
techniques is the Canonic Sign Digit method, in which Fig. 1 Existing fixed tap Mixed Mode Filter
coefficients are represented by a combination of powers of
two in such a way that multiplication can be simply
implemented with adder/subtractions and shifters, and the 4. Proposed Methodology
Dempster-Mcleod method, which similarly involves the
representation of filter coefficients with powers of two but In this method, variable order filter is proposed.
in this case arranging partial results in cascade to introduce Fig.2.Shows the Programmable tap fixed mode Filter. We
further savings in the usage of adders. The second type of can change the filter order to any number if purpose the 3rd
multiplier-less method involves the use of memories order, 5th order, 15th order can be designed. Concurrent
(RAMs, ROMs) or LUTs to store pre-computed values of execution of software or program can be achieved in
coefficient operations. These are called memory-based FPGA through adaptive high capacities pipelined which
methods. Examples of them are found in the Constant performed parallel processing. With this methodology we
Coefficient Multiplier method and the very-well known aim to design a digital FIR filter operating at above
DA method. DA appeared as a very efficient solution 1.3GHZ.
especially suited for LUT-based FPGA architectures. This
technique is a multiplier-less architecture that is based on
an efficient partition of the function in partial terms using
2's complement binary representation of data. The partial
terms can be pre-computed and stored in LUTs. The Odd and Even Adder Output Register
Filter partial sum
flexibility of this algorithm on FPGAs permits everything Input generator Filter output
from bit-serial implementations to pipelined or full-parallel

versions of the scheme, which can greatly improve the
design performance. The main problem with DA is that the
Left synchronous portion
requirement of memory/LUT capacity increases Asynchronous portion
Right synchronous portion
exponentially with the order of the filter, given that DA

implementations need 2K - words (K being the number of Fig. 2 Proposed Programmable tap fixed mode Filter
taps of the filter). That constitutes a first obstacle for FIR
www.IJCSI.org 397
5. Design Concepts of Digital FIR Filter 5.2 Design Concepts of Digital FIR Filter using
FPGA
Fir filter are commonly designed in DSP and FPGA
platforms. Therefore, the basic design concepts using DSP Implementation of the filter requires considerably less
and FPGA are discussed below. resources than the previousdesign using DSP. This requires
about half the resources in terms of configurable blocks,
5.1 Design Concepts of Digital FIR Filter using DSP lookup tables. The saving in the adder chain is not so high,
since most of the adder tree size is dictated by the
A design method for FIR digital filter based on DSP coefficients size, not by the samples size. The lookup
processor with fixed point series in which the coefficient of tables must be writable. This increases its complexity,
filter is obtained and verified with the DSP measuring especially in terms of routing resources. The mixer
system. The digital filter’s all functionalities met design multiplier must be implemented using hard multipliers, not
expectations. Filtering plays a significant role in digital lookup tables. A single large lookup table to hold
signal processing. Digital filtering is a basic calculation sine/cosine values is still needed. Especially for Altera
method for language and graphics treatment, mode FPGAs, this is a large advantage, as these chips have
recognition, and spectrum analysis. This method has many smaller RAM blocks, but also one or two large RAMs. Re-
advantages over an analogue filter, such as broad design tuning the band is relatively slow [5, 11]. The filter has no
amplitude, precision guarantee, and accurate linear phase capability for frequency hopping. This is not a
position; and prevention of voltage shifting, temperature requirement, and tap reloading is in any case faster than for
migration, and noise. Since its response to unit impulse is a full1024 tap filter. Some intelligence is needed in the
in limited long sequence, FIR filter is always stable. In control processor to recalculate filter taps from the low
addition to those advantages, digital filtering using DSP pass prototype, but this is within the capabilities of any
chip is flexible, convenient to change the filter’s current microprocessor. We use ModelSim Tool to
parameters, and easy to modify its specificity. The determine filter coefficients, and designed a 16-
methodologies for high-level synthesis of dedicated DSP orderconstant coefficient FIR filter by VHDL language [3,
architectures using the COMET design system is in use. 9], simulate filters, the results meet performance
The system is tuned to the synthesis of DSP ASICs from requirements. As the word indicates, a filter separates a
behavioral specifications written in VHDL. COMET is desired signal from unwanted disturbances. When we want
capable of generating more efficient architectures using to remove a disturbance such as noise from an audio
innovative scheduling and resource allocation algorithms signal, we design an appropriate filter that passes only the
which exploit the cluster information and maximize the desires signal. But only in a few cases can we remove the
parallel tasks. With these transformations, major disturbance completely and recover the desired signal;
improvements are achieved with fewer registers and most of the time we have to settle for a compromise, most
interconnections; an industrial quality design is then of the disturbance is rejected, most of the signal is
derived in both FIR and elliptic filter examples. Filter recovered. The first candidate in filter is a linear filter. The
banks are often used in signal and image processing main reason for this choice is that we have a good
applications for dividing a signal into frequency bands and understanding of how a linear system operates. It is only
reconstructing the signal from the individual bands. when a linear design fails or it yields unsatisfactory results
Quadrature Mirror Filter is one particular application using that we look for other solutions, such as nonlinear or,
the sub-band coding technique, have not able advantages adaptive techniques, for example. Digital filters include
for image compression / restoration compared with the infinite impulse response (UR) digital filter and finite
Discrete Cosine Transform. Silicon compilation has impulse response (FIR) digital filter. As the FIR system
become essential to automate the VLSI design of DSP have a lot of good features, such as only zeros, the system
system as chips increase in size and complexity. High-level stability, operation speed quickly, linear phase
synthesis, an important front end task from an algorithmic characteristics and design flexibility, so that FIR has been
behavioral specification, has received a lot of attention in widely used in the digital audio, image processing, data
both the academic and Industrial environments. Generally, transmission, biomedical and other areas. FIR filter has a
the input description is converted into a Data Flow Graph variety of ways to achieve, with the processing of modem
and all synthesis tasks work from this Data Flow Graph. electronic technology, taking use of field programmable
Behavioral synthesis is a complex task composed primarily gate array FPGA for digital signal processing technology
of two interacting subtasks: scheduling and allocation. A has made rapid development, FPGA with high integration,
great deal of progress has been made on the theory of high- high speed and reliability advantages, FIR filter
level synthesis and promising results [2, 4]. implementation using FPGA is becoming a trend. The
algorithm is proposed for the design of low complexity
www.IJCSI.org 398
linear phase finite impulse response (FIR) filters with computational complexity of the design, finding filter bank
optimum discrete coefficients. The proposed algorithm, structures that structurally satisfy perfect reconstruction is
based on mixed integer linear programming, efficiently of great interest. Lifting structures are very attractive for
traverses the discrete coefficient solutions and searches for the construction and implementation of filter and wavelets
the optimum one that results in an implementation using because the perfect reconstruction property can be
minimum number of adders. During the searching process, structurally imposed offers a filter bank with low
discrete coefficients are dynamically synthesized based on implementation complexity. However, there are certain
a continuously updated sub expression space and, most restrictions on the frequency responses.
essentially, a monitoring mechanism is introduced to
enable the algorithm’s awareness of optimality. Benchmark 6. Digital FIR Filter Architecture
examples have shown that the proposed algorithm can, in
most cases, produce the optimum designs using minimum The architecture of the FIR filter is shown in figure.3.The
number of adders for the given specifications. The filter is a ten-tap six-bit FIR filter using the distributed
proposed algorithm can be simply extended for the arithmetic architecture. Six bit Slices, stacked on top of
optimum design with the maximum adder depth constraint. each other. It consists of three portions namely [3].
Linear phase finite impulse response (FIR) filters are
widely used in digital signal applications such as speech 6.1 Left Synchronous Portion
coding, image processing, MultiMate systems, etc.
Although the stability and linear phase is guaranteed, the Receives data from the environment and processes it into
complexity and power consumption of the linear phase FIR partial sums Asynchronous portion: Ads the partial sums to
filter are usually much higher than that of the infinite compute the final result.
impulse response (IIR) filter which meets the same
magnitude response specifications. Therefore, many efforts 6.2 Right Synchronous Portion
have been dedicated to the design of low complexity and
low-power linear phase FIR filters. A conventional filter
structure, called transposed direct form, in which the input Right Synchronous Portion synchronizes the result to the
signal is first multiplied by the constant filter coefficients clock and produces it as an output for the environment.
and then goes into the delay elements. This operation is Data inputs enter from the left, and are processed by the
often referred to as multiple constants multiplication filter as they flow to the right. The filter can be divided
problem. The constant multipliers can be realized using into three portions, from Left to right. The leftmost portion
multiplier less techniques where the general multipliers are is clocked, from the input side to the domino latches. The
replaced by a network of shifts and adders. The adders can middle portion, from the XOR gates to the end of the carry
be further classified into structural adders and multiplier look ahead adder, is asynchronous. Finally, the rightmost
block adders. Structural Adders are used to add the portion, consisting of an output latch, is again clocked. The
temporarily stored values. An efficient semi definite architecture of the filter is best understood by following the
programming method for the design of a class of linear flow of data from left to right [4]. As the stream of data
phase finite impulse response filter banks whose filters enters the filter, it first passes through a shift register,
have optimal frequency selectivity for a prescribed which stores the most recent input values that are needed
regularity order is proposed. The design problem is to compute the filter output. In particular, for a p-tap filter,
formulated as the minimization of the least square error for each bit, there is a p-place shift register that stores the
subject to peak error constraints and regularity constraints. most recent history for that bit. These stored input values
By using the linear matrix inequality characterization of are then multiplied by their respective filter weights. The
the trigonometric semi-infinite constraints, it can then be multiplication is accomplished very efficiently by fetching
exactly cast as a Semi definite programming problem with precompiled results from a lookup table.
a small number of variables and, hence, can be solved
efficiently. Finally, the image coding performance of the
filter bank is presented. The filter has found important
applications in image processing, speech processing,
communications, and the construction of wavelet bases.
The filter bank design is commonly formulated as a highly
nonlinear optimization problem because of the perfect
reconstruction condition. As a result, high complexity
algorithms are required to obtain a good solution, and the
globally optimal solution is not guaranteed. To reduce the
www.IJCSI.org 399
Even Partial Odd Partial

sums sums
16b X 16b X 8b
Self timed control
Xin 16b X 8b
8b
4b
1 Shift Decoder
16:1
Register
16b
8b 8b X 8b
Domino
Latch Yout
Outp
ut
Latch
16b
Carry Carry
Save adder Look
Ahead
Adder
Register 16:1
16b
Domino
Latch 8b X 8b
Fig. 3 FIR Filter Architecture
7. FILTER IMPLEMENTATION
The entire multiplication process is bit-sliced, with one The FIR filter implementation is now considered in more
slice for each bit of the input data. The result of the detail. The synchronous and asynchronous portions of the
multiplications is a set of partial sums which are fed to the chip are discussed separately, followed by a discussion of
asynchronous portion of the filter pipeline for addition [5]. the interface between the two domains [6].
In the figure1, the lookup table is composed of two banks
of registers containing the precompiled result scaled even
7.1 Synchronous Portion
and odd partial sums and two output multiplexors.
The synchronous portion of the filter consists of two parts,
6.3 Asynchronous Portion
one at the input side of the filter, and the other at the
output side.
It is a nine-stage pipeline that adds all of the partial sums
together, and produces the result. Finally, this result is
latched by a clocked latch and output to the right 7.2 Synchronous Input Portion
environment.
This part receives the input to the filter. The input stream
consists of data values which are six bits wide [5]. A 10-
slot shift register at the input side of the filter stores the 10
most recent data values. These stored input values are
needed to compute the current filter output, which is a
weighted sum of these values. The multiplication of inputs
www.IJCSI.org 400
by their respective filter weights is accomplished very portions, the function of this asynchronous pipeline is to
efficiently by pre-computing all possible products and take the partial sums generated byte synchronous input
storing them into a lookup table. The entire multiplication portion, add them up to produce the final filter result, and
is bit-sliced, with one slice for each of the six bits in the send it to the synchronous output portion. The pipeline was
input data. Therefore, within each bit slice, there are 10 designed using the high-capacity pipeline style the
input bits which together forma 10-bit address for asynchronous data path uses dynamic logic, and consists of
accessing the lookup table [3]. nine stages [1], [2]. The first stage is a layer of XOR gates
The size of the lookup table is reduced by employing two that restores the correct sign to the partial sums. The next
techniques is used: the 10-bit address is divided into two 5- five stages correspond to five layers of carry save adders
bit addresses, one composed of only the even-index bits, The last three stages implement a carry look ahead adder
and the other composed of the odd-index bits. Each of Since both true and complement values of the data bits are
these two addresses has a distinct lookup table associated needed to compute the XOR and addition functions, the
with it [6]. To understand the filter operation with a entire data path was implemented in dual-rail.
partitioned lookup table, consider a simulation of partial
sum lookup. The 10-bit pattern (after passing through the The data path is quite wide at the input to the first stage:
decoder unit) is used to generate separate groups of even 216 wires (= (8 data bits + 1 sign bit) (even and odd) ·6
and odd-indexed bits. In particular [6], only the five even (bit slices) ·2 (wires/bit)). The output of the last stage is a
bits are used; they are forked to the even multiplexor as its 15-bit result represented using 30 wires. Interestingly,
select bits, and also to a clocked register where, after one since the filter has a very fine-grain data path, no explicit
clock cycle delay, they become the odd-index select bits to matched delays are required [6]. The delay of each
the bottom multiplexor, for the next clock cycle. function block is matched by the completion generator’s
Appropriate entries in the event and odd lookup tables are AC element itself, through appropriate device sizing. The
then selected and sent to the domino latches. self-timed control of a high-capacity pipeline needs a slight
modification to handle the wide data path of the filter. In
A signed-digit offset binary notation is used to represent particular, buffers must be inserted in order to amplify the
table entries and addresses, which enable the separation of control signals which are broadcast to the entire width of
the sign-bit from each address, further shortening the the data path [4]. Two different versions of the control
addresses to 4-bit words.[4] As a result, the table size is were designed, one more robust and the other faster.
dramatically reduced: two tables with only 16 (= 24) The two versions differ in the placement of the amplifying
entries each are needed, as opposed to one table with 1024 buffers. In the first version, the buffers amplify data path as
(= 210) entries. The lookup tables are implemented using well as the completion generator. This version is very
registers and multiplexors. Each table has 16 registers, robust to variations in buffer delays because the
each of which can store an 8-bit entry, per bit slice. Each completion signals are delayed by the same amount as the
of the tables has a 16:1 multiplexor at its output, controlled data path [3], [4]. However, the buffers are on the critical
by the 4-bit address word.2 The odd-index address word is path, thus increasing the pipeline cycle time. In the second
generated front he even-index address word by delaying it version, the completion generators use control signals that
by one clock cycle[5]. The result of the multiplication is a are tapped off from before the buffers. As a result, the
set of products, called partial sums, that is sent to the buffer delays are taken off of the critical path, resulting in
asynchronous pipeline for addition, through the a shorter cycle time. However, each stage’s function block
synchronous-asynchronous interface [2]. now lags behind its completion generator by an amount
equal to the buffer delay. Consequently, for the pipeline to
7.3 Synchronous Output Portion function correctly, all the stages throughout the pipeline
are required to have comparable buffer delays
The right synchronous portion simply consists of
7.5 Synchronous-Asynchronous Interfaces
a master slave latch that receives the final result
from the asynchronous pipeline and makes it The interface between the asynchronous and the
available as the filter output. synchronous portions of the chip must mediate certain
differences in data representation and control sequencing.
7.4 Asynchronous Portion: In particular, the asynchronous data path uses dual-rail
dynamic logic, whereas the synchronous portions of the
The asynchronous portion of the filter consists of a chip use single-rail static logic [5], [6]. Moreover, the
pipeline that lies between the synchronous input and output asynchronous pipeline communicates by means of local
www.IJCSI.org 401
handshakes (using req’s and ack’s) at each end, whereas

the synchronous portion uses global clocking. Consider the multiplication of positive numbers. The first
version of the multiplier circuit, which implements the
8. Internal Modules shift-and-add multiplication method for two n-bit numbers,
is shown in the figure. The 2n-bit product register (A) is
The internal modules used for the design of the digital FIR initialized to 0. Since the basic algorithm shifts the
filter are Adder, Multiplexer and Shift and Add multiplicand register (B) left one position each step to
Multiplication. The respective internal modules are align the multiplicand with the sum being accumulated in
explained in details as follows: the product register, we use a 2n-bitmultiplicand register
with the multiplicand placed in the right half of the register
and with 0in the left half.
8.1Adder
An Adder is a digital circuit that performs addition of START
number in modern computer adders reside in the arithmetic

logic unit where other operation are performed. Adder can
be constructed for much numerical representation such as B X,Q Y
binary coded decimal. The most common adders operate
on binary number. A O,N n
8.2 Multiplexer
NO Q0 = 1 Yes
It is a device that performs multiplexing. It selects one of
many analog or digital input signals and forwards the
selected input to a single line. An electronic multiplexer
makes it possible for several signal to share one device.
The multiplexer units are used to select the appropriate An A+B
output from the shift and add unit
8.3 Shift-and-Add Multiplication

Shift A_Q right
Shift-and-add method adds the multiplicand X to itself Y

times, where Y denotes the Multiplier. In the binary
multiplication, the digits are 0 and 1; each step of the N N-1
multiplication is simple. If the multiplier digit is 1, a copy
of the multiplicand (1 × multiplicand) is placed in the
proper positions; if the multiplier digit is 0, a number of 0
digits (1 × multiplicand) are placed in the proper positions
NO N=0
B (Multiplicand)
Yes
N
bits
STOP
Add
ALU Control
N Shift right Fig. 5 Flowchart of the Final version algorithm of the Shift
bits
and Add Multiplication
Write
A (Product) Q (Multiplier, Product)
N N The Fig.5. Shows the basic steps needed for the

bits bits
multiplication. The algorithm starts by loading the
Fig. 4 Final Version of Shift and Add Multiplication multiplicand into the B register, loading the multiplier into
Circuit.
the Q register, and initializing the A register to 0. The
www.IJCSI.org 402
counter N is initialized to n. The least significant bit of the

multiplier register (Q0) determines whether the Formal verification is widely used to verify traditional
multiplicand is added to the product register. The left shift standard cell ASIC designs. The term timing analysis is
of the multiplicand has the effect of shifting the used to refer to two methods called Static Timing Analyses
intermediate products to the left. The right shift of the (STA) and the timing simulation. By running formal
multiplier prepares the next bit of the multiplier to examine verification in conjunction with STA, it is confirmed that
in the next iteration. the post-route net list is the same as the RTL design in
9. FPGA Design flow functionality. STA is one of the techniques available to
The flow chart of a typical FPGA design flow is shown in verify the timing of a digital design. The STA is static
Fig.6. The design flow and FPGA design methodologies since the analysis of the design is carried out statistically
are reviewed by eminent researchers. The design and does not depend upon the data values being applied at
specifications are written first to describe the functionality, the input pins. An alternate approach used to verify the
interface and overall architecture of the digital circuit to be timing is the timing simulation which can verify the
designed with short development time [5]. A behavioral functionality as well as the timing of the design where a
description is then created to analyze the design in terms of stimulus is applied on input signals, resulting behavior is
functionality to meet the performance, compliance to observed and verified, then time is advanced with new
standards and other high-level issues. Behavioral input stimulus applied, and the behavior is observed and
descriptions can be written with HDLs and it is converted verified and so on. Thus, timing analysis simply refers to
to RTL description in an HDL. The designer has to the analysis of the design for timing issues. In power
describe the data flow to implement the desired digital analysis, power consumption of the implemented digital
circuit using any market standard simulator. The circuit is calculated to satisfy the power requirement
functionality of the intended application is verified in the specifications. After meeting all the specifications, the
functional verification and tested using different set of PROM file is generated to download into FPGA/CPLD
stimulus. The Logic synthesis tools convert the RTL using JTAG Cable. The Signal Integrity (SI) stage
description to a technology independent gate-level net list, addresses two concerns in the design aspects; they are
which is a description of the circuit in terms of gates and timing and the quality of the signal. The goal of signal
connections between them. Behavioral synthesis tools have integrity analysis is to ensure reliable high-speed data
begun to emerge recently. These tools can create RTL transmission. In a digital system, a signal is transmitted
descriptions from a behavioral or algorithmic description from one component to another in the form of logic '1' or
of the circuit.. '0', which is actually at certain reference voltage levels.
The receiving component needs to sample the data in order
Design
to obtain the binary coded information. Any delay of the
specification
data or distortion of the data will result in a failure of the
data transmission. The SI check plays an important role in
Behavioral high speed FPGA design in order to satisfy the quality of
description
the signal at the far end of the board route, as well as the
RTL description
propagation delay. Major FPGA vendors provide ISE
comprising simulator, synthesizer and implementation
tools and third party support is provided for simulation,
Functional
verification and
synthesis, power analysis depending on the design
requirement.
Logic synthesis
Formal
10. RESULTS & DISCUSSIONS
Gate level net list
In this paper the internal modules such as adder, multiplexer

Static timing
Logical verification
and testing
and shift and add multiplication is used as the basic
modules. First, the basic modules are simulated and the
Power analysis results are presented. Then designed digital FIR filter is
Place and route
simulated for different orders and frequencies and the
results are presented.
Board level Device
programming
Fig. 6 Flow chart of FPGA Design Flow
www.IJCSI.org 403
band stop filter is clearly visible when the order of the

filter increases
Fig. 7 Simulation Result of Adder
Fig. 10 FIR Filter output of Order3, cut off frequency 4MHZ, sampling rate
50MHZ
Fig. 8 Simulation Result of Multiplexer
Fig 11 FIR Filter output of order 6, cutoff frequency 4MHz, sampling rate
50MHz
Fig. 9 Simulation Result of Shift and Add multiplication

10.1Simulation results of digital FIR filter with
different orders
The simulated results of digital FIR filter of different

orders such as order 3, order 6 and order 15 are presented
below with their order, cut off frequencies and sampling
frequencies respectively. As the order of the filter
increases, the performance of the filter increases. The
magnitude of the output signal decreases when the Fig. 12 FIR Filter output of order 15, cutoff frequency 4MHz, sampling
frequency of the input signal increases. The response of the rate 50MHz
www.IJCSI.org 404
10.2 Synthesis Report using ModelSim and synthesized using Xilinx. The
simulated results of digital FIR filter of different orders. As
The FIR filter designed is synthesized using Xilinx 9.1i the order of the filter increases, the performance of the
and the device utilization report is presented for the order filter increases. The magnitude of the output signal
3, 6 and 15 in Table 1 to Table 3 respectively. decreases when the frequency of the input signal increases.
The response of the band stop filter is clearly visible when
Table 1: Synthesis report of FIR filter of order 3 the order of the filter increases. The FIR filter designed is
synthesized in Xilinx 9.1i and the device utilization report is
Logic utilization Used Available Utilization presented for filter of order 3, 6 and 15 respectively.
Number of Slices 64 3584 1% 12. References

Number of Slice
84 7168 1% [1]OppenheimA.V. and SchaferR. W., discrete-Time Signal processing.
Flip Flops
Upper Saddle River, NJ: Prentice hall, 1989.
Number of 4 input [2]Ching-Tang Chang, Kenneth Rose and Robert A. Walker, “High-Level
67 7168 0%
LUTs DSP Synthesis Using the COMET Design System” IEEE
Number of bonded Trans. On digital signal processing systems, vol. 8, no. 6, pp. 408-411,
27 141 19%
IOBs 2010.
Number of GCLKs 1 8 12% [3]Douglas. L. Perry, VHDL: Programming by examples. New York:
McGraw-Hill, 2002.
[4]Guo Gaizhi, Zhang Pengju, Yu Zongzuo, Wang Hailong, “Design and
Implementation of FIR Digital Wave Filter Based on DSP”
Table 2: Synthesis report of FIR filter of order 6 IEEE Trans. on digital signal processing systems, vol.489-491, no. 978,
Logic Utilization Used Available Utilization pp. 978-989, 2010.
Number of Slices 113 3584 3% [5] Odriguez-AndinaJ. J. R., MooreM. J., andValdesM. D., “Features,
Number of Slice design tools, and application domains of FPGAs” IEEE Trans.
121 7168 1% Ind. Electron. vol. 54, no. 4, pp. 1810-1823, 2007.
Flip Flops
[6] Tierno.J, Rylyakov.A.S, Rylov, Singh.S, Ampadu.P, Nowak’s,
Number of 4 input
125 7168 1% Immediato.M, and Gowda.S, A 1.3 GSamples10 tap full rate
LUTs Variable latency self-timed FIR filter with clocked interfaces in Proc. Int.
Number of bonded Solid State Circuit Conference, San Francisco, CA, pp-444.
29 141 20%
IOBs Feb. 2002
[7] Montek Singh, Jose Tierno.A, Alexander Rylyakov, Sergey Rylov,
and Steven Nowick .M, Fellow, IEEE. “An Adaptively Pipelined
Mixed Synchronous Asynchronous Digital FIR Filter Chip Operating At
Table 3: Synthesis report of FIR filter of order 15 1.3 GHz”. IEEE Transactions on very large scale integration
Logic Utilization Used Available Utilization (VLSI) systems, vol. 1.8, no.7, July 2010.
Number of Slices 219 3584 6% [8]Singh.M, Tierno.J.A, Rylyakov.A, Rylov.S and Nowick.S.M, “An
adaptively- Pipelined mixed synchronous-asynchronous digital
Number of Slice
198 7168 2% FIR filters chip operating at 1.3GHz, in proc. IEEE Int. Symp.
Flip Flops Asynchronous Circuits and System, Manchester, U.K, pp.84-95,
Number of 4 input pr.2002.
365 7168 5%
LUTs [9]Peter Ashenden.J, “VHDL Tutorial,” Ashenden Designs Pty. Ltd.,
Number of 4 input Elsevier Science, USA, 2004.
30 141 21% [10] Ramesh Babu, “Digital Signal Processing”,-TATA McGraw Hill,
LUTs
Number of 2007.
1 8 12% [11] Xilinx Corporation “Xilinx Spartan-3E FPGA family: Complete data
GCLKs
sheet,” 2007. [Online]. Available: http://www.xilinx.com.
11. Conclusion
A hybrid synchronous and asynchronous digital FIR filter

has been designed and Implemented in FPGA using
VHDL. The digital FIR filter of high throughput, low
latency operating at above 1.3 GHz has been designed. An
adaptive high capacity pipelined was introduced in the
hybrid synchronous asynchronous design of the filter. The
degree of the pipelining is dynamically variable depending
upon the input. Concurrent execution of software or
program can be achieved in FPGA through parallel
processing. The designed digital FIR filter is simulated

4 Ijcsi

Uploaded by

Copyright:

Available Formats

4 Ijcsi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Ijcsi

Uploaded by

Copyright:

Available Formats

IJCSI International Journal of Computer Science Issues, Vol.

9, Issue 4, No 1, July 2012

A Proficient Design of Hybrid Synchronous and Asynchronous

Abstract extract useful parts of the signal, such as the components

2.2 Digital Implementation of FIR Filter using FPGA 3. Problem Formulation

from bit-serial implementations to pipelined or full-parallel

exponentially with the order of the filter, given that DA

Even Partial Odd Partial

Self timed control

Fig. 3 FIR Filter Architecture

handshakes (using req’s and ack’s) at each end, whereas

An Adder is a digital circuit that performs addition of START

number in modern computer adders reside in the arithmetic

output from the shift and add unit

8.3 Shift-and-Add Multiplication

Shift-and-add method adds the multiplicand X to itself Y

A (Product) Q (Multiplier, Product)

N N The Fig.5. Shows the basic steps needed for the

counter N is initialized to n. The least significant bit of the

In this paper the internal modules such as adder, multiplexer

Fig. 6 Flow chart of FPGA Design Flow

band stop filter is clearly visible when the order of the

Fig. 7 Simulation Result of Adder

Fig. 8 Simulation Result of Multiplexer

Fig. 9 Simulation Result of Shift and Add multiplication

The simulated results of digital FIR filter of different

Number of Slices 64 3584 1% 12. References

A hybrid synchronous and asynchronous digital FIR filter

You might also like