DSP Arithmetic

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

DSP Arithmetic

Contents
Fixed point representation Floating point representation Some math operations:
Addition Subtraction Multiplication Division

Comparison
2

Introduction
Practical DSP implementation consideration: Possible quantization errors Arithmetic errors Possible Overflow should take into

A DSP processors data format determines its ability to handle signals of different precisions, dynamic ranges, and SQNRs. In order to write efficient programs for DSP applications, we must understand how the processor manipulates data.
3

FIXED POINT NOTATION


4

Some Fixed-point processors:


TMS320C64xx processors ADSP2101 processor

Fixed point notation


Fixed point DSPs usually represent each number with a minimum of 16 bits, although a different length can be used. There are four common ways that these 216( 65,536) possible bit patterns can represent a number.
Unsigned integer Signed integer Unsigned fraction Signed fraction
6

Fixed Point Representation


In unsigned integer, the stored number can take on any integer value from 0 to 65,535. Example: Consider 4-bit representation. We can represent numbers in the range 0 to 15. 410=01002 710=01112 810=10002 However, if the result of an arithmetic representation exceeds 1510, overflow occurs.

Fixed Point Representation


Similarly, signed integer uses two's complement to make the range include negative numbers, from -32,768 to 32,767. Example: Consider 4 bit representation. We can represent numbers in the range -8 to 7. 510=0101 -510=1011

Uses twos complement to represent signed numbers. The first bit is used as signed bit.

Fractional fixed point notation


Used for representing numbers with both integer and fractional parts. The Qm.n convention uses m bits to represent the integer portion of the number and n bits to represent the fractional portion. Total no. of bits: N=m+n+1

Radix pt.

1 Sign bit

m integer bits

n fractional bits

Q15 format
For example, a 16-bit number that uses 1 sign bit and 15 bits for the fractional part is called Q0.15 format or simply the Q15 format. Q.15 format is commonly used in DSP systems and data must be properly scaled so that their value lies between -1 and 0.999969482421875.

10

Fractional Fixed point notation


With unsigned fraction notation, the 65,536 levels are spread uniformly between 0 and 1.
Number 0 1 2

Decimal fraction
0 1/8 2/8

Fractional notation
000 001 010

Example: Consider 3-bit unsigned fraction representation

3
4 5 6 7

3/8
4/8 5/8 6/8 7/8

011
100 101 110 111

Fractional Fixed point notation


Lastly, the signed fraction format allows negative numbers, equally spaced between -1 and 1.
Number 3 2 1

Decimal fraction
3/4 2/4 1/4

Fractional notation
011 010 001

Example: Consider 3-bit signed fraction representation.

0
-1 -2 -3 -4

0
-1/4 -2/4 -3/4 -1

000
111 110 101 100

Example
Represent the decimal number, 0.95624 as
A Q3 number, and A Q4 number

Q3 number: A Q3 is a 2s complement number with one sign bit and 3 fractional bits. 0.9562423 = 7.64992 This no. can be rounded to 7=0111.
13

Example (contd.)
Q4 number: A Q4 is a 2s complement number with one sign bit and 4 fractional bits.

0.9562424 = 15.29984
This no. can be rounded to 15=01111.

14

Example (contd.)
Errors in representation: Case-1: Q3 notation error = (7.649927)8=0.08124 Case-2: Q4 notation

error = (15.2998415)16=0.01874
The error in representing the number is often referred to as coefficient quantization error.
15

Implementation
Most fixed-point DSP processors use twos complement fractional numbers in different Q formats. However, assemblers only recognize integer values.

The programmer must keep track of the position of the binary point when manipulating fractional numbers in assembly programs. The following steps convert a fractional number in Q format into an integer value that can be recognized by the assembler. Let us see this with an example:

16

Implementation (contd.)
Assume that the coefficient used by the assembler is 1.18. the DSP processor uses Q15 format. Step 1: normalize the fractional number to the range determined by the desired Q format.
For Q15 format the range is [-1,1). Normalize the number to this range. Thus, 1.18/2 = 0.59

Step 2: Multiply the normalized fractional number by 2n, where n is the no. of fractional bits.
Multiply 0.59 by 215. thus, 0.59 32,768 = 19,333.12

Step 3: round the product to the nearest integer.

Round the decimal value 19,333.12 to obtain 19333 = 4B85h

17

Implementation (contd.)
The arithmetic result obtained by a DSP processor is in the integer form. It can be interpreted as a fractional value by dividing by 2n. This is equivalent to shifting the binary point n bits to the left. In DSP implementation, it is not always necessary to use Q.15 format throughout the DSP algorithm; instead, we can use different Q formats for different dynamic range requirements.

18

Binary addition-Example
Addition of two 4-bit numbers represented in Q3 format: 0.100 0.5 + 0.011 0.375 = 0.111 0.875
No overflow

0.101 0.625 + 0.011 0.375 = 1.000 1

Overflow
Thus, addition of two numbers in fractional representation can result in overflow.
19

Binary multiplication-Example
When multiplying two 4-bit numbers in Q.3 format requires a 7bit word in Q.6 format to store the product. and there is no overflow. 0.111 0.875 0.110 0.75 = 0.101010(0.65625) We want to store the result in a 4-bit word and hence, truncate the result to the four most significant bits(0.101) Then, the error is 0.65625-0.625=0.03125 Multiplication in Q format does not result in overflow except in the case of 1 1 = 1(which is not in the range)
20

Binary division
Hardware implementation of division is expensive. Therefore, most processors do not provide a single-cycle divide instruction supported by the hardware.

For an N-bit fractional number, fractional division can be realized by repeating the conditional subtraction instruction (N-1) times.

21

FLOATING POINT ARITHMETIC

22

Floating Point Processors


TMS320C3x TMS32067x ADSP2106x Floating point formats allow numbers to be represented with a large dynamic range. Thus, floating point arithmetic can reduce the problem of overflow that occurs in fixed point arithmetic.

23

Floating point formats


A binary floating point number X is represented as the product of two signed numbers, the mantissa M and the exponent E. = 2 The exponent determines the range of numbers that can be represented, the mantissa the accuracy of the numbers. For example, if mantissa16 bits, exponent8 bits:

Range of numbers that can be represented: 0.5 2128 (1 215 ) 2128

24

IEEE floating point format


IEEE 754 Standard:
32 s 31 Exponent (8 bit)

23

22 Mantissa (23 bit)

Fig: Floating point representation(IEEE single precision)

The decimal equivalent, X, of a normalized IEEE floating point number is given by, = 1 (1. ) 2127 Where ,
F is the mantissa in 2s complement binary fraction E is the exponent in excess 127 form s=0 for positive no.s, s=1 for negative no.s
25

Floating point addition


In order to perform floating point addition, we have to adjust the exponent of the smaller number to match that of the bigger number. Consider

= 11 1. 1 21127 and Y= 12 (1. 2) 22127


= +
26

Example
We are given two floating point numbers = 2.44 = 10 1.22 2128127 = 12.16 = 10 (1.52) 2130127 Here, > ||
+ = [10 (1.52) 10 1.22 2 So, in the result : s=1 mantissa=0.215
130128

] 2130127

exp=3+127=130

27

Floating point multiplicationExample


= 2.44 = 10 1.22 2128127 = 12.16 = 10 1.52 2130127
= [10 (1.52) 2130127 10 1.22 2
128127

= 1 1.8544 24 = 29.6704

So, in the result : s=1 mantissa=0.8544

exp=4+127=131
28

The mantissas of the two numbers are multiplied, while the exponent terms are added without the need to align them.

Most floating point processors perform automatic normalization so that numbers are properly shifted and aligned. The programmer just needs to take care of the overflow problem. However, due to large dynamic range scaling is rarely necessary. Hence, floating point processors are easier to use than fixed point processors.

29

COMPARISON
Between Fixed Point and Floating point notations
30

Comparison
Fixed point
16- or 24- bit devices

Floating point
32-bit devices

Limited dynamic range


Overflow and quantization errors must be resolved.

Large dynamic range


Easier to program as no scaling is required. Better C compiler efficiency; can be developed in C.
31

Poorer C compiler efficiency; normally programmed in assembly.

Comparison
Fixed point
Faster clock rate

Floating point
Slower clock rate

Functional units are simpler, less silicon area required.

Functional units are complex, more silicon area required. More expensive
Higher power consumption
32

Cheaper Lower power consumption

References
Sen M Kuo, Woon-seng S. Gan, Digital Signal ProcessorsArchitectures, Implementations and Applications Emmanuel Ifeachor, Barrie W. Jervis, Digital Signal Processing

Steven M. Smith, The Scientist And Engineers Guide To Digital Signal Processing

33

You might also like