Fixed Point vs. Floating Point

Faculty Of Engineering
Communications & Control Engineering Department
Prepared by:
Hind J. Zourob
Heba M. Matter
220033212
220012336
Supervisor:
Dr. Hatem El-Aydi
Digital Signal Processing
Outlines
Introduction.
Fixed
point processors.
Q Format.
Floating point processors.
Overflow and scaling.
Scaling.
Comparison.
Conclusion.
Digital Signal Processing - 1
Digital
Signal Processing can be divided into two

categories,
Introduction
Fixed
point
Floating point.
Refer
to the format used to store and manipulate

numbers within the devices.
Digital Signal Processing -
Fixed point processors

Represents
bits.
each number with a minimum of 16
Numbers
are represented and manipulated in

integer format.
There
are four common ways that these

2^16=65,536 possible bit patterns can represent
a number:
unsigned integer
signed integer
unsigned fraction notation
signed fraction format
Digital Signal Processing -
Fixed point processors usages

There
is simply no need for the floatingpoint processing in mobile TVs.
Indeed,
floating point computations would

produce a more precise DCT,
Unfortunately, the DCTs in video codec are
designed to be performed on a fixed point
processor and are bit-exact.
Q Format (Fractional Representation)

In
a 16-bit system, it is not possible to represent

numbers larger than 32767 and smaller than
32768.
To
cope with this limitation, numbers are often

normalized between -1,1.
This
achieved by moving the implied binary point.
Number representations
Examples
Example (1)
Consider two Q15 format

numbers are multiplied
what the result and how will it
be stored in 16 bit memory?
Example (2)
Example (3)
.But
It
should be realized that some precision

is lost.
As
a result of discarding the smaller

fractional bits.
To
solve this problem, the scaling

approach will be used (discussed later)
Floating point Number

representation
Use a minimum of 32 bits to store each value.
The represented numbers are not uniformly spaced.
Composed of a mantissa and exponent
Floating point processor can also support integer

representation and calculations.
There are two floating-point data representations on the

C67x processor:
single precision (SP)

and double precision(DP).
Digital Signal Processing -10
Single precision and double precision
C67x floating-point data representation.
C67x double precision floating-point representation
Floating point processors

All
steps needed to perform floating-point

arithmetic are done by the floating-point
hardware.
It
is inefficient to perform floating-point

arithmetic on fixed-point processors , Since all
the operations involved, must be done in
software.
Floating
processors usages
In militarypoint
radar, the floating point processor is
frequently used because its performance is
essential.
Floating
point processing is good for doing large

FFTs so we can implement the FIR in frequency
domain.
Appropriate
in systems where gain coefficients

are changing with time or coefficients have large
dynamic ranges .
Overflow and scaling

When
multiplying two Q15 numbers, which are in

the range of 1 and 1the product will be in the
same range.
However,
when two Q15 numbers are added, the

sum may fall outside this range, leading to an
overflow.
Overflows
can cause major problems by

generating erroneous results.
The
simplest correction method for overflow is

scaling.
Scaling
The
idea of scaling is to scale down the system input

before performing any processing then to scale up the
resulting output to the original size.
Scaling
can be applied to most filtering and transform

operations.
An
easy way to achieve scaling is by shifting.
Since
a right shift of 1 is equivalent to a division by 2,

we can scale the input repeatedly by 0.5 until all
overflows disappear.
The
output can then be rescaled back to the total

scaling amount.
Comparison
Characteristic
Floating point
Fixed point
Dynamic range
much larger
smaller
Resolution
comparable
comparable
Speed
comparable
comparable
Ease of programming
much easier
more difficult
Compiler efficiency
more efficient
less efficient
Power consumption
comparable
comparable
Chip cost
comparable
comparable
System cost
comparable
comparable
Design cost
less
more
faster
slower
Time to market
Conclusion
DSP
processors are designed as fixed point and

floating point.
Fixed-point
Partition a binary word into integer and fractional

Radix point is in a fixed position
Floating-point
Large dynamic range

Composed of a mantissa and exponent
Scaling
solves the problem of overflow.

Comparison between fixed point and floating point

Fixed Point vs. Floating Point

Uploaded by

Copyright:

Available Formats

Fixed Point vs. Floating Point

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fixed Point vs. Floating Point

Uploaded by

Copyright:

Available Formats

Faculty Of Engineering

Communications & Control Engineering Department

Signal Processing can be divided into two

to the format used to store and manipulate

Digital Signal Processing -

Fixed point processors

each number with a minimum of 16

are represented and manipulated in

are four common ways that these

Fixed point processors usages

is simply no need for the floatingpoint processing in mobile TVs.

floating point computations would

Digital Signal Processing - 4

Q Format (Fractional Representation)

a 16-bit system, it is not possible to represent

cope with this limitation, numbers are often

achieved by moving the implied binary point.

Consider two Q15 format

Digital Signal Processing - 6

Digital Signal Processing - 7

Digital Signal Processing - 8

should be realized that some precision

a result of discarding the smaller

solve this problem, the scaling

Floating point Number

Use a minimum of 32 bits to store each value.

The represented numbers are not uniformly spaced.

Composed of a mantissa and exponent

Floating point processor can also support integer

There are two floating-point data representations on the

single precision (SP)

Single precision and double precision

C67x floating-point data representation.

C67x double precision floating-point representation

Digital Signal Processing -11

Floating point processors

steps needed to perform floating-point

is inefficient to perform floating-point

Digital Signal Processing -12

point processing is good for doing large

in systems where gain coefficients

Overflow and scaling

multiplying two Q15 numbers, which are in

when two Q15 numbers are added, the

can cause major problems by

simplest correction method for overflow is

idea of scaling is to scale down the system input

can be applied to most filtering and transform

easy way to achieve scaling is by shifting.

a right shift of 1 is equivalent to a division by 2,

output can then be rescaled back to the total

Digital Signal Processing -16

processors are designed as fixed point and

Partition a binary word into integer and fractional

Large dynamic range

solves the problem of overflow.

You might also like