FFT Using VHDL

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

1).

Fast Fourier Transform (FFT):


Fast Fourier Transform is an algorithm to compute Discrete Fourier Transform(DFT).
DFT is used to convert a time domain signal into its frequency spectrum domain. FFT is an
efficient algorithm to compute DFT. There are many distinct FFT algorithms involving a wide
range of mathematics, from simple complex-number arithmetic to group theory and number
theory.
FFTs are algorithms for quick calculation of discrete Fourier transform of a data vector. The
FFT is a DFT algorithm which reduces the number of computations needed for N points from O(N 2)
to O(N log N) where log is the base-2 logarithm.

2-point DIT-FFT Algorithm :


2-point FFT butterfly will have 2 inputs and 2 outputs. Butterfly model is shown below

Where twiddle factor =>


Here N=2 and k=0 so the twiddle factor value is=1
Further the simplified values of outputs are
X(0)=A(0)+WnB(0);
X(1)=A(0)-WnB(0);

A = x + Wn * y;
A_r + j*A_i = (x_r + j* x_i) + ( w_r + j*w_i) *(y_r + j*y_i);
A_r + j*A_i = (x_r + (w_r * y_r ) ( w_i *y_i)) + j*(x_i +( w_i *y_r) +( w_r
* y_i));

A_r = x_r + (w_r * y_r ) ( w_i *y_i);


A_i = x_i +( w_i *y_r) +( w_r * y_i);

B = x-+ Wn * y;
B_r + j*B_i = (x_r + j* x_i) - ( w_r + j*w_i) *(y_r + j*y_i);
B_r + j*B_i = ( x_r - (w_r *y_r) + (w_i * y_i)) + (x_i (w_r * y_i ) (w_i *
y_r ));
B_r =( x_r - (w_r *y_r) + (w_i * y_i);
B_i = x_i (w_r * y_i ) (w_i * y_r );

IMPLIMENTATION OF DIT-FFT ALGORITHM USING VHDL CODE:


-- The behavioral model outputs of 2 point fft algorithm model are
-- A_r = x_r + (y_r*w_r) - (y_i*w_i);
-- B_r = x_r - (y_r*w_r) + (y_i*w_i);
-- A_i = x_i + (y_r*w_i) + (y_i*w_r);
-- B_i = x_i - (y_r*w_i) - (y_i*w_r);
--------------------------------------------------LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;
USE IEEE.STD_LOGIC_ARITH.ALL;

Entity fft_radix_2 is
port (
x_r

: inout std_logic_vector(7 downto 0);

y_r

: inout std_logic_vector(7 downto 0);

x_i

: inout std_logic_vector(7 downto 0);

y_i

: inout std_logic_vector(7 downto 0);

w_r

: inout std_logic_vector(7 downto 0);

w_i

: inout std_logic_vector(7 downto 0);

A_r

: out std_logic_vector(7 downto 0);

A_i

: out std_logic_vector(7 downto 0);

B_r

: out std_logic_vector(7 downto 0);

B_i

: out std_logic_vector(7 downto 0)

);
end fft_radix_2;
architecture behavioral of fft_radix_2 is
signal product_1,product_2 : std_logic_vector(15 downto 0);
begin
product_1 <= (y_r*w_r) - (y_i*w_i);
product_2 <= (y_r*w_i) + (y_i*w_r);
A_r <= x_r + product_1(7 downto 0);
A_i <= x_i + product_2(7 downto 0);
B_r <= x_r - product_1(7 downto 0);
B_i <= x_i - product_2(7 downto 0);
DUT:process
begin
x_r <="00000110";
y_r <="00000100";
x_i <="00000010";
y_i <="00000001";
w_r <="00000001";

w_i <="00000000";
wait;
end process DUT;
end architecture;

2). Represent the FFT process by means of a data flow graph.

Imaginary_part of Output-A

Imaginary_part of Output-B

3). Represent the FFT process by means of a scheduled sequencing graph considering
ASAP and ALAP schedules.

ASAP:(As Soon As Possible) Here some computations are performed in the first cycle

ALAP:(As Late As Possible) Here some computations need not be computed in the first cycle.

4. Assume that only one ALU (for addition. comparison, etc) and one multiplier is available.
Draw a scheduled sequencing graph for the process taking resource constraints into
consideration and considering minimum number of clock cycles.
When there are only one multiplier and one ALU it takes minimum 8-clock
cycles to complete the entire operation. So the latency of the sequencing graph is =
8*T

5). Obtain two designs of the FFT processor from problem 2 and problem 4 and
realize a structural model of the processor in VHDL.
---------------------------------------------------- DATA FLOW STRUCTURAL model of FFT algorithm
-- using 2 ALUs and 2 multipliers
---------------------------------------------------

LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;
USE IEEE.STD_LOGIC_ARITH.ALL;

entity data_flow_fft_struct is
GENERIC (
N

: INTEGER := 8

);
port (
clk
reset

: in std_logic;
: in std_logic;

data_valid_in : in std_logic;
ain_r

: inout std_logic_vector(N-1 downto 0); -- real input data

bin_r

: inout std_logic_vector(N-1 downto 0); -- real input data

ain_i

: inout std_logic_vector(N-1 downto 0); -- imaginary input data

bin_i

: inout std_logic_vector(N-1 downto 0); -- imaginary input data

w_r

: inout std_logic_vector(N-1 downto 0);

w_i

: inout std_logic_vector(N-1 downto 0);

data_valid_out : out std_logic;

aout_r

: out std_logic_vector(N-1 downto 0);-- real output data

aout_i

: out std_logic_vector(N-1 downto 0);-- imaginary output data

bout_r

: out std_logic_vector(N-1 downto 0);-- real output data

bout_i

: out std_logic_vector(N-1 downto 0) -- imaginary output data

);
end data_flow_fft_struct;

architecture behav of data_flow_fft_struct is

signal mul1_in_1,mul1_in_2,mul2_in_1,mul2_in_2:std_logic_vector(N-1 DOWNTO


0);
signal ALU1_in_1,ALU1_in_2,ALU2_in_1,ALU2_in_2:std_logic_vector(N-1 downto
0);
signal mul1_out,mul2_out:std_logic_vector((2*N-1) downto 0);
signal ALU1_out,ALU2_out:std_logic_vector(N-1 downto 0);
signal mul1_out_b,mul2_out_b:std_logic_vector(N-1 downto 0);
signal ALU1_out_b,ALU2_out_b:std_logic_vector(N-1 downto 0);
signal a_r_t,a_i_t:std_logic_vector(N-1 downto 0);
signal Add_sub_1,Add_sub_2

type Times is ( time1,


time2,
time3,
time4,
time5,

: std_logic;

time6
);
signal next_state : times;
begin
process(clk,reset)
begin
if( reset ='0' ) then
a_r_t

<= (others =>'0');

a_i_t

<= (others =>'0');

aout_r

<= (others =>'0');

bout_r

<= (others =>'0');

aout_i

<= (others =>'0');

bout_i

<= (others =>'0');

Add_sub_1

<= '1';

Add_sub_2

<= '1';

data_valid_out <= '0';


next_state

<= time1;

elsif(clk = '1' and clk'event) then

case next_state is
when time1 =>
if (data_valid_in='1')then
mul1_in_1 <= bin_r;
mul1_in_2 <= w_r;

mul2_in_1 <= bin_r;


mul2_in_2 <= w_i;
next_state <= time2;
else
next_state<=time1;
end if;
when time2 =>
mul1_out_b <= mul1_out( 7 downto 0);
mul2_out_b <= mul2_out( 7 downto 0);

mul1_in_1 <= bin_i;


mul1_in_2 <= w_i;
mul2_in_1 <= bin_i;
mul2_in_2 <= w_r;
next_state <= time3;
when time3 =>
Add_sub_1 <='0'; -- subtractor-1
ALU1_in_1 <= mul1_out_b;
ALU1_in_2 <= mul1_out(7 downto 0);

Add_sub_2 <='1'; -- Adder-2


ALU2_in_1 <= mul2_out_b;
ALU2_in_2 <= mul2_out(7 downto 0);
next_state <= time4;
when time4 =>
ALU1_out_b <= ALU1_out;

ALU2_out_b <= ALU2_out;


Add_sub_1 <= '1';
ALU1_in_1 <= ain_r;
ALU1_in_2 <= ALU1_out_b;
Add_sub_2 <='1';
ALU2_in_1 <= ain_i;
ALU2_in_2 <= ALU2_out_b;
next_state <= time5;
when time5 =>
a_r_t <= ALU1_out;
a_i_t <= ALU2_out;

Add_sub_1<='0';
ALU1_in_1 <= ain_r;
ALU1_in_2 <= ALU1_out_b;
Add_sub_2<='0';
ALU2_in_1 <= ain_i;
ALU2_in_2 <= ALU2_out_b;
next_state <= time6;
when time6 =>
aout_r <= a_r_t;
aout_i <= a_i_t;
bout_r <= ALU1_out;
bout_i <= ALU2_out;
data_valid_out <='1';
next_state <= time6;

when others => NULL;


end case;
end if;
end process;
ALU1_out <= ALU1_in_1 + ALU1_in_2 when (Add_sub_1='1') else
ALU1_in_1 - ALU1_in_2;
ALU2_out <= ALU2_in_1 + ALU2_in_2 when Add_sub_2='1' else
ALU2_in_1 - ALU2_in_2;
mul1_out <= mul1_in_1 * mul1_in_2;
mul2_out <= mul2_in_1 * mul2_in_2;
DUT : process
begin
ain_r <= "00000110";
ain_i <= "00000010";
bin_r <= "00000100";
bin_i <= "00000001";
w_r <= "00000001";
w_i <= "00000000";
wait;
end process;
end behav;
Result:-

----------------------------------------------------- STRUCTURAL MODEL OF RESOURCE CONSTRAINT MODEL


--------------------------------------------------LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;
USE IEEE.STD_LOGIC_ARITH.ALL;

entity resrc_constraint_fft is
GENERIC ( N : INTEGER := 8
);
PORT ( x_r : inout std_logic_vector(N-1 downto 0);
x_i : inout std_logic_vector(N-1 downto 0);
y_r : inout std_logic_vector(N-1 downto 0);
y_i : inout std_logic_vector(N-1 downto 0);

w_r : inout std_logic_vector(N-1 downto 0);


w_i : inout std_logic_vector(N-1 downto 0);
A_r : out std_logic_vector(N-1 downto 0);
A_i : out std_logic_vector(N-1 downto 0);
B_r : out std_logic_vector(N-1 downto 0);
B_i : out std_logic_vector(N-1 downto 0);
clk : in std_logic;
reset : in std_logic;
valid_in : in std_logic;
valid_out : out std_logic
);
end resrc_constraint_fft;

architecture behav of resrc_constraint_fft is

signal ALU_in_1,ALU_in_2:std_logic_vector(N-1 downto 0);


signal mul_in_1,mul_in_2:std_logic_vector(N-1 downto 0);
signal ALU_out,ALU_out_b_1:std_logic_vector(N-1 downto 0);
signal mul_out:std_logic_vector(2*N-1 downto 0);
signal mul_out_b:std_logic_vector(N-1 downto 0);
signal Add_sub:std_logic;
signal a_r_t,a_i_t,b_r_t:std_logic_vector(N-1 downto 0);

type times is ( time1,


time2,
time3,

time4,
time5,
time6,
time7,
time8,
time9
);
signal next_state:times;

begin
process(reset,clk)
begin
if (reset = '0') then
A_r <= (others=>'0');
A_i <= (others=>'0');
B_r <= (others=>'0');
B_i <= (others=>'0');
-- Add_sub <= (others =>'0');
a_r_t <= (others =>'0');
a_i_t <= (others =>'0');
b_r_t <= (others =>'0');
--

valid_out <= (others =>'0');


next_state <= time1;

elsif (clk = '1' and clk'event ) then


case next_state is
when time1 =>

if valid_in = '1' then


mul_in_1 <= y_r;
mul_in_2 <= w_r;
next_state <= time2;
else
next_state <= time1;
end if;
valid_out <='0';
when time2 =>
mul_out_b <= mul_out(7 downto 0);
mul_in_1 <= y_i;
mul_in_2 <= w_i;
next_state <= time3;
when time3 =>
Add_sub <= '0'; -- sub
ALU_in_1 <= mul_out_b;
ALU_in_2 <= mul_out(7 downto 0);
next_state <= time4;
when time4 =>
ALU_out_b_1 <= ALU_out;
Add_sub <='1'; --Add;
ALU_in_1 <= x_r;
ALU_in_2 <= ALU_out;

mul_in_1 <= y_i;


mul_in_2 <= w_r;

next_state <= time5;


when time5 =>
a_r_t <= ALU_out;
mul_out_b <= mul_out(7 downto 0);
Add_sub <='0'; -- sub
ALU_in_1 <= x_r;
ALU_in_2 <= ALU_out_b_1;

mul_in_1 <= y_r;


mul_in_2 <= w_i;
next_State <= time6;
when time6 =>
b_r_t <= ALU_out;
Add_sub <='1'; -- Add
ALU_in_1 <= mul_out_b;
ALU_in_2 <= mul_out(7 downto 0);
next_state <= time7;
when time7=>
ALU_out_b_1 <= ALU_out;
Add_sub <='1'; --Add
ALU_in_1 <= x_i;
ALU_in_2 <= ALU_out;
next_state <= time8;
when time8 =>
a_i_t <= ALU_out;
Add_sub <='0'; --sub

ALU_in_1 <= x_i;


ALU_in_2 <= ALU_out_b_1;
next_state <= time9;
when time9 =>
A_r <= a_r_t;
A_i <= a_i_t;
B_r <= b_r_t;
B_i <= ALU_out;
valid_out <='1';
next_state <= time9;
when others => NULL;
end case;
end if;
end process;
-- ADDER AND MULTIPLIER are assumed to be implimented
ALU_out <= ALU_in_1 + ALU_in_2 when (Add_sub='1') else
ALU_in_1 - ALU_in_2;
mul_out <= mul_in_1 * mul_in_2;

DUT:process
begin
x_r <= "00000110";
x_i <= "00000010";
y_r <= "00000100";
y_i <= "00000001";
w_r <= "00000001";

w_i <= "00000000";


wait;
end process DUT;
end behav;
Result:-

6). Synthesize the models in problem 5 on Altera Quartus II synthesis tool and
compare the resource utilization summary in the two cases.
Synthesized the problems in Quartus II by selecting EP2C20F484c7 FPGA
Problem-2 uses the resources as follows
Total=126/18,752 (<1%)
Combinational with no registers =27

Registers only =61


Combinational with registers =38

Problem-4 uses the resources as follows


Total=158/18,752 (<1%)
Combinational with no registers =37
Registers =56
Combinational with registers =65

Data flow graph uses more number of resources than the resource constraint sequencing graph.

You might also like