VHDL Implementation of Wavelet Packet Transforms Using SIMULINK Tools
VHDL Implementation of Wavelet Packet Transforms Using SIMULINK Tools
VHDL Implementation of Wavelet Packet Transforms Using SIMULINK Tools
SIMULINK tools
Mukul Shirvaikar and Tariq Bushnaq
Electrical Engineering Department
The University of Texas at Tyler
Tyler, TX 75799
e-mail: [email protected]
ABSTRACT
The wavelet transform is currently being used in many engineering fields. The real-time implementation of the Discrete
Wavelet Transform (DWT) is a current area of research as it is one of the most time consuming steps in the JPEG2000
standard. The standard implements two different wavelet transforms: irreversible and reversible Daubechies. The former
is a lossy transform, whereas the latter is a lossless transform. Many current JPEG2000 implementations are software-
based and not efficient enough to meet real-time deadlines. Field Programmable Gate Arrays (FPGAs) are
revolutionizing image and signal processing. Many major FPGA vendors like Altera and Xilinx have recently developed
SIMULINK tools to support their FPGAs. These tools are intended to provide a seamless path from system-level
algorithm design to FPGA implementation. In this paper, we investigate FPGA implementation of 2-D lifting-based
Daubechies 9/7 and Daubechies 5/3 transforms using a Matlab/Simulink tool that generates synthesizable VHSIC
Hardware Description Language (VHDL) code. The goal is to study the feasibility of this approach for real time image
processing by comparing the performance of the high-level toolbox with a handwritten VHDL implementation. The
hardware platform used is an Altera DE2 board with a 50MHz Cyclone II FPGA chip and the Simulink tool chosen is
DSPBuilder by Altera.
1. INTRODUCTION
There is a current demand for high quality digital images and videos over bandwidth-limited channels such as the
Internet and cellular phones. There is a major push for a proportional increase in the capabilities of the current
communication systems, facing an onslaught of media production. Instead of replacing the physical infrastructure to
reach higher bandwidth, it is more economical to develop image compression schemes to be used over the current
communication systems, which allow representing images in a compact form. The JPEG transform has become one of
the most known image compression techniques and it is used in different applications. The compression standard was
developed by the Joint Photographic Experts Group (JPEG), whose abbreviation became the name of the standard. When
applied to a still image, the standard uses the Discrete Cosine Transform (DCT), which applies cosine functions of
different frequencies for data analysis and decorrelation. The DCT yields the transform domain coefficients which get
quantized and entropy encoded1. JPEG is a lossy technique that delivers a compression ratio in the range of 10:1 to 20:1,
while preserving acceptable image quality for most consumer applications2. However, when the desired compression
ratio is 24:1 or above, the decrease in available bits allows only the average pixels of the 8 × 8 blocks to be encoded 2.
The decompressed image is constructed using these 8 × 8 blocks, thereby creating undesired checkerboard effects in the
reproductions.
For this reason and others such as variable rate data stream compression, the Joint Photographic Experts Group has
developed a new image compression standard called JPEG2000 3. The JPEG2000 standard provides a better compression
ratio than the original JPEG standard4. JPEG2000 also includes extra features which were not available in previous
standards such as allowing different compression schemes for digital images. Although it does not specify a particular
compression algorithm at the heart of the standard, the Discrete Wavelet Transform (DWT) has replaced the DCT in
In Figure 1, the original image (a) was compressed by a ratio of approximately 88:1 using JPEG and JPEG2000. The
JPEG image in (b) is degraded due to the checkerboard affect mentioned earlier, while the JPEG2000 standard results in
a superior image quality as shown in (c). All these features and more qualify JPEG2000 to be the image compression
standard choice of the future.
These powerful features come at the expense of process complexity. JPEG2000 implementations are up to six times
more computationally complex than JPEG 6. Software implementations of JPEG200 have the advantage of being
flexible, however they may not suitable to meet the hard deadlines of most real time systems. On the other hand,
hardware implementations such as ASICs offer high performance in terms of speed but lack the flexibility of software
implementations on DSP platforms. A compromise between these two is the use of Field Programmable Gate Arrays
I I 120
100 100
S Other Bo
4 S DWT
S Entropy w
0 Entropy
S DWT
;E 40-
20 -
0
Lossless Lossy Loss less Lossy
(a) Software JPEG2000 benchmark (b) Hardware JPEG2000 benchmark
The introduction of FPGAs adds the engineering burden of CAD tools and Hardware Description Languages (HDLs).
The engineering personnel’s knowledge and expertise in this area can be a limiting factor in meeting the current market
demand for hardware implementation of algorithms such as the DWT. MATLAB is currently one of the most popular
engineering software packages. It provides a powerful graphical system design toolbox known as SIMULINK. With this
in mind, major FPGA vendors like Altera and Xilinx have recently developed SIMULINK tools to support their FPGAs.
These tools are intended to provide a seamless path from system level algorithm design to FPGA implementation.
In this paper, we specifically investigate FPGA implementation of 2-D lifting-based Daubechies 9/7 and Daubechies 5/3
transforms using a MATLAB/SIMULINK tool that generates synthesizable VHDL code. The goal is to study the
feasibility of this approach for real time image processing by comparing the performance of the high-level toolbox with a
handwritten VHDL implementation. The hardware platform used in this experiment is an Altera DE2 FPGA board with
8MB of SDRAM, 4MB of flash memory, a SD card slot, and a 50MHz Cyclone II FPGA chip. The Simulink tool chosen
is DSPBuilder by Altera and has an extensive library of building blocks. The design flow implementation allows users to
convert designs to VHDL, synthesize, and program the board using a single GUI. The results of the transform can be
displayed on the 16x2 LCD display mounted on the board and are compared to the correct values from MATLAB
implementations. The SIMULINK implementation is compared with a handwritten VHDL implementation for both
transforms; evaluating FPGA hardware utilization (number of logical components) and execution time.
The first phase in the JPEG2000 encoding process is two levels of discrete wavelet decomposition 6. As mentioned
earlier, JPEG2000 implements reversible and irreversible DWT. The inputs to the DWT stage are tiles of the image data
and an L-level decomposition. The wavelet transform is performed using either the 9/7 floating-point wavelet discussed
in Antonio et al 7, or the 5/3 integer wavelet discussed in Calderbank et al 8. Progression is possible with either wavelet,
but 5/3 must be used to progress to a lossless transform as mentioned in Marcellin et al 9. It is important to keep in mind
that all wavelet transforms described in the JPEG2000 are one-dimensional. Therefore, in order to perform a 2-D DWT
The second stage is to use the even samples to predict the odd ones with the use of equation (1.1). The more correlation
present in the data the closer will be the predicted value to the odd samples.
M9A6I
0061
2b1 !f bLedi! Cf fG
2C91!ILJd
coei
The final scaling coefficients are the difference between the original odd samples and the predicted ones, which is the
outcome of equation (1.1). If the input data had perfect correlation the outcome of equation (1.1) would be zero.
Therefore, it is apparent that the wavelet coefficients capture the high frequency (deviation from DC level) in the image.
In the final phase the even samples are lifted using the scaling coefficients and computed as following
⎡ Y (2n − 1) + Y (2n + 1) + 2 ⎤
Y (2n) = X ext (2n) + ⎢ ⎥ (1.2)
⎣ 4 ⎦
The scaling coefficients are the outcome of equation (1.2). The coefficients preserve the DC level of the input data
capturing the low frequency component of the image.
io ileft iright
even 2 1
Odd 1 2
M9AGIE
COG1
lubiif 2b1 !T bL6H( :e bLGfl( :e
2C91!L
coe
II
Equations (2.1) through (2.6) describe the different stages of the transform namely: predict I, update I, update II, scale I,
scale II, which yield the final wavelet and scaling coefficients. The coefficient values as defined by the JPEG2000
standard and used in the equations below are shown in Table 2.
Predict I: Y (2n + 1) = Xext (2n + 1) + [α ×[ Xext (2n) + Xext (2n + 2)]] (2.1)
1
Scale II: Y (2n) = ( ) × Y (2n) (2.6)
K
Finally, as mentioned in the discussion of the Daubechies 5/3 DWT, Table 3 includes the minimum symmetric extension
values as described in the JPEG2000 1D_EXTR procedure. This extension makes the signal sufficiently large for
processing the finite signal.
io ileft iright
even 4 3
odd 3 4
3. Matlab/Simulink
Nowadays, many DSP algorithms are being designed using FPGAs instead of traditional ASICs or DSP processors. The
main FPGA producers in the market are Altera and Xilinx and both companies have reported revenues greater than $1
billion14. A new visual design methodology introduced by both companies is based on the popular and established
Matlab/Simulink interface. This approach sounds promising, due to the large number of Matlab programmers, which is
over 1 million worldwide14. Xilinx had introduced its own Simulink tool which is called System Generator, and Altera
has developed DSPBuilder. In this paper we will use DSPBuilder in the experiment due to its numerous features, which
will be discussedbelow and also later in this paper.
The Altera library includes 171 design blocks arranged in 11 subgroups. These building blocks allow the user to design
DSP systems without the need to import any custom Matlab or HDL code into the design 14. In addition, DSPBuilder has
a very smooth and intuitive design flow. The program is fully integrated with Simulink and does not require the user to
use any other support or third-party software throughout the entire design process. DSPBuilder provides a single
graphical user interface (GUI) that allows the user to perform synthesis, filter, and board programming. The program
offers a collection of board components such as LEDs, DIP switches, A/D, and D/A convertors. These blocks allow the
user to effectively eliminate the need for manual pin assignment and configuration for the different components on the
development kit. Figure 5 shows some of the subcategories and building blocks in DSPBuilder.
4. EXPERIMENTS
The objective of the experiments in this section is to evaluate the Quality of Results (QoR) of the Simulink tool
compared to the traditional hand-written HDL. To do so, we implement Daubechies 5/3 and 9/7 using both hand-written
VHDL and DSPBuilder and compare the performance of each implementation in terms of speed and number of logical
units.
sage:
Simulink
AlLero DSP Duilder DlockseL
7:0 << AItBus
a
Control System Toolbox
Wi. Fuzzy Logic Toolbox Avalon-ST Source
Neural Network Toolbox
• Real-Time Workshop
Simulink Extras
]: Fl 0]: P]1 Binary Point Casting
• Simulink Response Optimization
I StoteFlow Binary To Seven Segments
• • System Identilication Toolbox
FL. B us Builder
Bus Concotenotron
The top entity 2-D DWT provides the starting offset and the step size to the 1-D DWT entity. The 1-D DWT entity
applies DWT on either a given row or column from the image data. The implementation does not reorder the results of
the 1-D transform since we are only interested in performance comparison. The design extends the data as mentioned
earlier accordingly and applies the sliding window method to calculate the scaling and wavelet coefficients 15. The
standard approach to implementing DWT requires several passes for each row to implement each lifting and scaling
stage in the transform. The Daubechies 5/3 would require two passes for each row and Daubechies 9/7 would require six
passes for each row to avoid the need to buffer the entire row of data 15. The sliding window method only requires one
The correctness of the transforms is verified by comparing the output of both transforms to that of a handwritten Matlab
algorithm performing the identical operations. The Matlab code implements 2-D DWT and complies with the constraints
and the specifications of the HDL design mentioned earlier. The output of the transform is displayed on the LCD screen
attached to the development kit and compared to the output of the Matlab code. Note that a 64 × 64 image generates
4096 outputs. To save time, we only compare a random row and a column of the transform’s output with the same row
and column from the output of the Matlab program.
Design Results
VHDL DSPBuilder
LCs MHz LCs MHz
Daubechies 5/3 110 203.13 79 250.40
Daubechies 9/7 133 66.08 107 109.0
5. RESULTS
The outputs of each design were compared and verified to ensure correctness of the design. Table 4 shows the results of
the synthesis report of the HDL and DSPBuilder designs. The first column shows the number of logical components
(LC) generated by the DSPBuilder and the VHDL code. In the case of Daubechies 5/3 the DSPBuilder utility generated
79 LCs compared to 110 LCs generated by the hand-written VHDL. The DSPBuilder design generated a lesser number
of logical components implying a more efficient design. The second column shows the maximum frequency (fmax)
achieved by each design. fmax is the maximum clock frequency that can be achieved without violating setup and hold time
requirements. Again DSPBuilder has outperformed the VHDL code in time analysis. In the case of Daubechies 9/7 the
fmax generated by DSP builder is 109 MHz which is equal to a 9.17 ns delay, whereas the VHDL design generated an fmax
The results of the experiments suggest that DSPBuilder has a desired QoR and can be considered as a competitive
alternative to the traditional method of FPGAs development. An additional advantage of the DSP builder is that it allows
for faster implementation and hence time-to-market. As a bonus, it provides automated test benches at no extra cost or
time.
6. CONCLUSION
The wavelet transform is currently being used in many engineering fields. The real time implementation of the Discrete
Wavelet Transform (DWT) is a current area of research as it is one of the most time consuming steps in the JPEG2000
standard. Popular realizations of the standard implement two different wavelet transforms: irreversible and reversible
Daubechies. FPGAs are revolutionizing image and signal processing and many major FPGA vendors like Altera and
Xilinx have recently developed SIMULINK tools to support their products. These tools are intended to provide a
seamless path from system-level algorithm design to FPGA implementation. The results of this experiment have
demonstrated that the Simulink design flow for FPGAs has a better QoR than hand-written VHDL code, and is a viable
alternative for DSP design. The sophisticated DSPBuilder toolbox is another formidable strength of this approach. The
Quality of Results achieved by DSPBuilder match or exceed handwritten VHDL making Matlab/SIMULINK tools
worthy of consideration when it comes to DSP algorithm implementation.
ACKNOWLEDGEMENTS
The authors would like to thank Dr. Yea Zong Kuo, Comcept Division, L3 Communications Inc. for her role as a
technical consultant during the course of this project.
REFERENCES
1. Zafarifar, B., “Micro-codable Discrete Wavelet Transform,” Computer Engineering Laboratory, Delft University of
Technology, July 2002.
2. Brennan, J., “FPGA Coprocessing in a JPEG2000 Implementation,” School of Information Technology and
Electrical Engineering, University of Queensland, October 2001.
3. ISO/IEC FCD 15444-1:2000 V1.0, 16 March 2000), http://www.jpeg.org, (Last visited December, 12, 2007).
4. Santa-Cruz, D. and Ebrahimi, T., “An Analytical Study of JPEG 2000 Functionalities,” Proc. Int’l Conf. Image
Processing, IEEE, New Jersey, 2000.
5. Adams, M.D. , “The JPEG2000 Still Image Compression Standard”, Department of Electrical and Computer
Engineering, University of Victoria, December 2005.
6. Cantineau, O., “Enabling Real-Time JPEG2000 with FPGA Architecture,” Global Signal Processing Conferences &
Expos (GSPx), International Signal Processing Con., CF-JPG031505-1.0, March 2005.
7. Antonio, M., Barlaud, M., Mathieu, P. and Daubechies, I., “Image coding using wavelet transform,” IEEE Trans. On
Image Proc., vol. 1, no. 2, pp 205-220, Apr. 1992.
8. Calderbank, A. R., Daubechies, I., Sweldens, W. and Yeo, B. –L, “Wavelet transforms that map integers to
Integers,” Appl. Comput Harmon. Anal, vol. 5 pp. 332-369, July 1998.
9. Marcellin, M. W., Gormish, M. J., Bilgin, A. and Boliek, M. P., “An Overview of JPEG-2000,” Proc. of IEEE Data
Compression Conference, pp. 532-541, 2000.
10. Skoras, A., Christopoulos, C., and Ebrahimi, T., “The JPEG 2000 Still Image Compression Standard,” IEEE Signal
Processing Magazine, pp 36-58, September 2001.
11. Sweldens, W. , “The lifting scheme: A new philosophy in biorthogonal wavelet constructions,” Proc. SPIE, vol.
2569, pp. 68-79, Sept. 1995.
12. Adams, M. D. and Kossenti, F., “Reversible Integer-to-integer Wavelet Transforms for Image Compression:
Performance Evaluation and Analysis,” IEEE Transactions on, vol. 9, no. 6, June 2000.