4 Ijcsi
4 Ijcsi
4 Ijcsi
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 396
DSP functions such as FIR filters, which are extensively filters of high order. A flexible architecture that gradually
used in multiple applications in telecommunications, replaces LUT requirements with multiplexer/adder pairs
wireless or satellite communications, video and audio was introduced. An asymmetric FIR filter architecture
processing, biomedical signal processing and many others. using the bit-serial LUT-based DA technique is
Traditionally, the design methods were mainly focused in presented. For this implementation, we use a scheme that
multiplier-based architectures to implement the multiply- takes advantage of the 4-input LUTs in FPGAs, and
and- Accumulate (MAC) blocks that constitute the central rearranges the input sequence to implement a modified
piece in FIR filters and several DSP functions: But careful version of the shifter/accumulator stage. We show that our
analysis shows that multiplier-based filter implementations modified version is superior in terms of area to previous
may become highly expensive [2], [4]. LUT-less DA architectures [5], [7].
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 397
5. Design Concepts of Digital FIR Filter 5.2 Design Concepts of Digital FIR Filter using
FPGA
Fir filter are commonly designed in DSP and FPGA
platforms. Therefore, the basic design concepts using DSP Implementation of the filter requires considerably less
and FPGA are discussed below. resources than the previousdesign using DSP. This requires
about half the resources in terms of configurable blocks,
5.1 Design Concepts of Digital FIR Filter using DSP lookup tables. The saving in the adder chain is not so high,
since most of the adder tree size is dictated by the
A design method for FIR digital filter based on DSP coefficients size, not by the samples size. The lookup
processor with fixed point series in which the coefficient of tables must be writable. This increases its complexity,
filter is obtained and verified with the DSP measuring especially in terms of routing resources. The mixer
system. The digital filter’s all functionalities met design multiplier must be implemented using hard multipliers, not
expectations. Filtering plays a significant role in digital lookup tables. A single large lookup table to hold
signal processing. Digital filtering is a basic calculation sine/cosine values is still needed. Especially for Altera
method for language and graphics treatment, mode FPGAs, this is a large advantage, as these chips have
recognition, and spectrum analysis. This method has many smaller RAM blocks, but also one or two large RAMs. Re-
advantages over an analogue filter, such as broad design tuning the band is relatively slow [5, 11]. The filter has no
amplitude, precision guarantee, and accurate linear phase capability for frequency hopping. This is not a
position; and prevention of voltage shifting, temperature requirement, and tap reloading is in any case faster than for
migration, and noise. Since its response to unit impulse is a full1024 tap filter. Some intelligence is needed in the
in limited long sequence, FIR filter is always stable. In control processor to recalculate filter taps from the low
addition to those advantages, digital filtering using DSP pass prototype, but this is within the capabilities of any
chip is flexible, convenient to change the filter’s current microprocessor. We use ModelSim Tool to
parameters, and easy to modify its specificity. The determine filter coefficients, and designed a 16-
methodologies for high-level synthesis of dedicated DSP orderconstant coefficient FIR filter by VHDL language [3,
architectures using the COMET design system is in use. 9], simulate filters, the results meet performance
The system is tuned to the synthesis of DSP ASICs from requirements. As the word indicates, a filter separates a
behavioral specifications written in VHDL. COMET is desired signal from unwanted disturbances. When we want
capable of generating more efficient architectures using to remove a disturbance such as noise from an audio
innovative scheduling and resource allocation algorithms signal, we design an appropriate filter that passes only the
which exploit the cluster information and maximize the desires signal. But only in a few cases can we remove the
parallel tasks. With these transformations, major disturbance completely and recover the desired signal;
improvements are achieved with fewer registers and most of the time we have to settle for a compromise, most
interconnections; an industrial quality design is then of the disturbance is rejected, most of the signal is
derived in both FIR and elliptic filter examples. Filter recovered. The first candidate in filter is a linear filter. The
banks are often used in signal and image processing main reason for this choice is that we have a good
applications for dividing a signal into frequency bands and understanding of how a linear system operates. It is only
reconstructing the signal from the individual bands. when a linear design fails or it yields unsatisfactory results
Quadrature Mirror Filter is one particular application using that we look for other solutions, such as nonlinear or,
the sub-band coding technique, have not able advantages adaptive techniques, for example. Digital filters include
for image compression / restoration compared with the infinite impulse response (UR) digital filter and finite
Discrete Cosine Transform. Silicon compilation has impulse response (FIR) digital filter. As the FIR system
become essential to automate the VLSI design of DSP have a lot of good features, such as only zeros, the system
system as chips increase in size and complexity. High-level stability, operation speed quickly, linear phase
synthesis, an important front end task from an algorithmic characteristics and design flexibility, so that FIR has been
behavioral specification, has received a lot of attention in widely used in the digital audio, image processing, data
both the academic and Industrial environments. Generally, transmission, biomedical and other areas. FIR filter has a
the input description is converted into a Data Flow Graph variety of ways to achieve, with the processing of modem
and all synthesis tasks work from this Data Flow Graph. electronic technology, taking use of field programmable
Behavioral synthesis is a complex task composed primarily gate array FPGA for digital signal processing technology
of two interacting subtasks: scheduling and allocation. A has made rapid development, FPGA with high integration,
great deal of progress has been made on the theory of high- high speed and reliability advantages, FIR filter
level synthesis and promising results [2, 4]. implementation using FPGA is becoming a trend. The
algorithm is proposed for the design of low complexity
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 398
linear phase finite impulse response (FIR) filters with computational complexity of the design, finding filter bank
optimum discrete coefficients. The proposed algorithm, structures that structurally satisfy perfect reconstruction is
based on mixed integer linear programming, efficiently of great interest. Lifting structures are very attractive for
traverses the discrete coefficient solutions and searches for the construction and implementation of filter and wavelets
the optimum one that results in an implementation using because the perfect reconstruction property can be
minimum number of adders. During the searching process, structurally imposed offers a filter bank with low
discrete coefficients are dynamically synthesized based on implementation complexity. However, there are certain
a continuously updated sub expression space and, most restrictions on the frequency responses.
essentially, a monitoring mechanism is introduced to
enable the algorithm’s awareness of optimality. Benchmark 6. Digital FIR Filter Architecture
examples have shown that the proposed algorithm can, in
most cases, produce the optimum designs using minimum The architecture of the FIR filter is shown in figure.3.The
number of adders for the given specifications. The filter is a ten-tap six-bit FIR filter using the distributed
proposed algorithm can be simply extended for the arithmetic architecture. Six bit Slices, stacked on top of
optimum design with the maximum adder depth constraint. each other. It consists of three portions namely [3].
Linear phase finite impulse response (FIR) filters are
widely used in digital signal applications such as speech 6.1 Left Synchronous Portion
coding, image processing, MultiMate systems, etc.
Although the stability and linear phase is guaranteed, the Receives data from the environment and processes it into
complexity and power consumption of the linear phase FIR partial sums Asynchronous portion: Ads the partial sums to
filter are usually much higher than that of the infinite compute the final result.
impulse response (IIR) filter which meets the same
magnitude response specifications. Therefore, many efforts 6.2 Right Synchronous Portion
have been dedicated to the design of low complexity and
low-power linear phase FIR filters. A conventional filter
structure, called transposed direct form, in which the input Right Synchronous Portion synchronizes the result to the
signal is first multiplied by the constant filter coefficients clock and produces it as an output for the environment.
and then goes into the delay elements. This operation is Data inputs enter from the left, and are processed by the
often referred to as multiple constants multiplication filter as they flow to the right. The filter can be divided
problem. The constant multipliers can be realized using into three portions, from Left to right. The leftmost portion
multiplier less techniques where the general multipliers are is clocked, from the input side to the domino latches. The
replaced by a network of shifts and adders. The adders can middle portion, from the XOR gates to the end of the carry
be further classified into structural adders and multiplier look ahead adder, is asynchronous. Finally, the rightmost
block adders. Structural Adders are used to add the portion, consisting of an output latch, is again clocked. The
temporarily stored values. An efficient semi definite architecture of the filter is best understood by following the
programming method for the design of a class of linear flow of data from left to right [4]. As the stream of data
phase finite impulse response filter banks whose filters enters the filter, it first passes through a shift register,
have optimal frequency selectivity for a prescribed which stores the most recent input values that are needed
regularity order is proposed. The design problem is to compute the filter output. In particular, for a p-tap filter,
formulated as the minimization of the least square error for each bit, there is a p-place shift register that stores the
subject to peak error constraints and regularity constraints. most recent history for that bit. These stored input values
By using the linear matrix inequality characterization of are then multiplied by their respective filter weights. The
the trigonometric semi-infinite constraints, it can then be multiplication is accomplished very efficiently by fetching
exactly cast as a Semi definite programming problem with precompiled results from a lookup table.
a small number of variables and, hence, can be solved
efficiently. Finally, the image coding performance of the
filter bank is presented. The filter has found important
applications in image processing, speech processing,
communications, and the construction of wavelet bases.
The filter bank design is commonly formulated as a highly
nonlinear optimization problem because of the perfect
reconstruction condition. As a result, high complexity
algorithms are required to obtain a good solution, and the
globally optimal solution is not guaranteed. To reduce the
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 399
16b X 16b X 8b
Xin 16b X 8b
8b
4b
1 Shift Decoder
16:1
Register
16b
8b 8b X 8b
Domino
Latch Yout
Outp
ut
Latch
16b
Carry Carry
Save adder Look
Ahead
Adder
Register 16:1
16b
Domino
Latch 8b X 8b
7. FILTER IMPLEMENTATION
The entire multiplication process is bit-sliced, with one The FIR filter implementation is now considered in more
slice for each bit of the input data. The result of the detail. The synchronous and asynchronous portions of the
multiplications is a set of partial sums which are fed to the chip are discussed separately, followed by a discussion of
asynchronous portion of the filter pipeline for addition [5]. the interface between the two domains [6].
In the figure1, the lookup table is composed of two banks
of registers containing the precompiled result scaled even
7.1 Synchronous Portion
and odd partial sums and two output multiplexors.
The synchronous portion of the filter consists of two parts,
6.3 Asynchronous Portion
one at the input side of the filter, and the other at the
output side.
It is a nine-stage pipeline that adds all of the partial sums
together, and produces the result. Finally, this result is
latched by a clocked latch and output to the right 7.2 Synchronous Input Portion
environment.
This part receives the input to the filter. The input stream
consists of data values which are six bits wide [5]. A 10-
slot shift register at the input side of the filter stores the 10
most recent data values. These stored input values are
needed to compute the current filter output, which is a
weighted sum of these values. The multiplication of inputs
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 400
by their respective filter weights is accomplished very portions, the function of this asynchronous pipeline is to
efficiently by pre-computing all possible products and take the partial sums generated byte synchronous input
storing them into a lookup table. The entire multiplication portion, add them up to produce the final filter result, and
is bit-sliced, with one slice for each of the six bits in the send it to the synchronous output portion. The pipeline was
input data. Therefore, within each bit slice, there are 10 designed using the high-capacity pipeline style the
input bits which together forma 10-bit address for asynchronous data path uses dynamic logic, and consists of
accessing the lookup table [3]. nine stages [1], [2]. The first stage is a layer of XOR gates
The size of the lookup table is reduced by employing two that restores the correct sign to the partial sums. The next
techniques is used: the 10-bit address is divided into two 5- five stages correspond to five layers of carry save adders
bit addresses, one composed of only the even-index bits, The last three stages implement a carry look ahead adder
and the other composed of the odd-index bits. Each of Since both true and complement values of the data bits are
these two addresses has a distinct lookup table associated needed to compute the XOR and addition functions, the
with it [6]. To understand the filter operation with a entire data path was implemented in dual-rail.
partitioned lookup table, consider a simulation of partial
sum lookup. The 10-bit pattern (after passing through the The data path is quite wide at the input to the first stage:
decoder unit) is used to generate separate groups of even 216 wires (= (8 data bits + 1 sign bit) (even and odd) ·6
and odd-indexed bits. In particular [6], only the five even (bit slices) ·2 (wires/bit)). The output of the last stage is a
bits are used; they are forked to the even multiplexor as its 15-bit result represented using 30 wires. Interestingly,
select bits, and also to a clocked register where, after one since the filter has a very fine-grain data path, no explicit
clock cycle delay, they become the odd-index select bits to matched delays are required [6]. The delay of each
the bottom multiplexor, for the next clock cycle. function block is matched by the completion generator’s
Appropriate entries in the event and odd lookup tables are AC element itself, through appropriate device sizing. The
then selected and sent to the domino latches. self-timed control of a high-capacity pipeline needs a slight
modification to handle the wide data path of the filter. In
A signed-digit offset binary notation is used to represent particular, buffers must be inserted in order to amplify the
table entries and addresses, which enable the separation of control signals which are broadcast to the entire width of
the sign-bit from each address, further shortening the the data path [4]. Two different versions of the control
addresses to 4-bit words.[4] As a result, the table size is were designed, one more robust and the other faster.
dramatically reduced: two tables with only 16 (= 24) The two versions differ in the placement of the amplifying
entries each are needed, as opposed to one table with 1024 buffers. In the first version, the buffers amplify data path as
(= 210) entries. The lookup tables are implemented using well as the completion generator. This version is very
registers and multiplexors. Each table has 16 registers, robust to variations in buffer delays because the
each of which can store an 8-bit entry, per bit slice. Each completion signals are delayed by the same amount as the
of the tables has a 16:1 multiplexor at its output, controlled data path [3], [4]. However, the buffers are on the critical
by the 4-bit address word.2 The odd-index address word is path, thus increasing the pipeline cycle time. In the second
generated front he even-index address word by delaying it version, the completion generators use control signals that
by one clock cycle[5]. The result of the multiplication is a are tapped off from before the buffers. As a result, the
set of products, called partial sums, that is sent to the buffer delays are taken off of the critical path, resulting in
asynchronous pipeline for addition, through the a shorter cycle time. However, each stage’s function block
synchronous-asynchronous interface [2]. now lags behind its completion generator by an amount
equal to the buffer delay. Consequently, for the pipeline to
7.3 Synchronous Output Portion function correctly, all the stages throughout the pipeline
are required to have comparable buffer delays
The right synchronous portion simply consists of
7.5 Synchronous-Asynchronous Interfaces
a master slave latch that receives the final result
from the asynchronous pipeline and makes it The interface between the asynchronous and the
available as the filter output. synchronous portions of the chip must mediate certain
differences in data representation and control sequencing.
7.4 Asynchronous Portion: In particular, the asynchronous data path uses dual-rail
dynamic logic, whereas the synchronous portions of the
The asynchronous portion of the filter consists of a chip use single-rail static logic [5], [6]. Moreover, the
pipeline that lies between the synchronous input and output asynchronous pipeline communicates by means of local
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 401
8.2 Multiplexer
NO Q0 = 1 Yes
It is a device that performs multiplexing. It selects one of
many analog or digital input signals and forwards the
selected input to a single line. An electronic multiplexer
makes it possible for several signal to share one device.
The multiplexer units are used to select the appropriate An A+B
B (Multiplicand)
Yes
N
bits
STOP
Add
ALU Control
N Shift right Fig. 5 Flowchart of the Final version algorithm of the Shift
bits
and Add Multiplication
Write
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 402
Formal
10. RESULTS & DISCUSSIONS
Gate level net list
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 403
Fig. 10 FIR Filter output of Order3, cut off frequency 4MHZ, sampling rate
50MHZ
Fig 11 FIR Filter output of order 6, cutoff frequency 4MHz, sampling rate
50MHz
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, July 2012
ISSN (Online): 1694-0814
www.IJCSI.org 404
10.2 Synthesis Report using ModelSim and synthesized using Xilinx. The
simulated results of digital FIR filter of different orders. As
The FIR filter designed is synthesized using Xilinx 9.1i the order of the filter increases, the performance of the
and the device utilization report is presented for the order filter increases. The magnitude of the output signal
3, 6 and 15 in Table 1 to Table 3 respectively. decreases when the frequency of the input signal increases.
The response of the band stop filter is clearly visible when
Table 1: Synthesis report of FIR filter of order 3 the order of the filter increases. The FIR filter designed is
synthesized in Xilinx 9.1i and the device utilization report is
Logic utilization Used Available Utilization presented for filter of order 3, 6 and 15 respectively.
11. Conclusion
Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.