DSP Unit 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

DIGITAL SIGNAL PROCESSOR

 A digital signal processor (DSP) is an integrated circuit


designed for high-speed data manipulations, and is
used in
 Audio Communications image manipulation
 Other data-acquisition and Data-control applications.

 Applications
 •Digital filtering (FIR and IIR)

 •FFT

 •Convolution and Matrix Multiplication


COMMON DSP FEATURES

 Harvard architecture
 Dedicated single-cycle Multiply-Accumulate (MAC)
instruction (hardware MAC units)
 Single-Instruction Multiple Data (SIMD) Very Large
Instruction Word (VLIW) architecture
 Pipelining

 Saturation arithmetic

 Zero overhead looping

 Hardware circular addressing

 Cache

 DMA (allows peripherals to access main memory without


the intervention of the CPU)
 The internal hardware of a digital signal processor
consists of many blocks:
 1.CPU
 2.Arithmetic Logic Unit (ALU)
 3.Accumulators
 4.Barrel shifter
 5.Multiplier unit
 6.Compare Select and Store Unit ( CSSU )
 7.Memory cache
 8.DMA controller
DSP ARCHITECTURE

 Von Neumann Architecture


 Harvard Architecture

 Super Harvard Architecture (SHARC)


VON NEUMANN ARCHITECTURE
 Von Neumann architecture contains a single memory
and a single bus for transferring data into and out of
the central processing unit (CPU).
HARVARD ARCHITECTURE

 It has separate memories for data and program


instructions, with separate buses for each. Since the
buses operate independently, program instructions
and data can be fetched at the same time, improving
the speed over the single bus design.
SUPER HARVARD ARCHITECTURE (SHARC)

 SHARC DSPs are optimized by addition of: an


instruction cache, and an I/O controller.
MULTIPLIER-ACCUMULATOR UNIT
( MAC )

 The MAC unit consists of a multiplier that has a pair of


input registers that holds the inputs to the multiplier and a
32 bit product register which holds the result of a
multiplication.
 •The output of the product register is connected to a
double precision accumulator where the products are
accumulated.
VERY LONG INSTRUCTION WORD (VLIW)

 A technique for instruction-level parallelism by


executing instructions without dependencies in
parallel
 Example of a single VLIW instruction: F=a+b; c=e/g;
d=x&y; w=z*h;
 The VLIW processor consists of architecture that reads a
relatively large group of instructions and executes them at
the same time.
 The multiple functional units share a common multiported
register file for fetching the operands and storing the results.
 The read/write cross bar provides parallel random access by
multiple functional units to the multiported register file.
 Execution of the operations in the functional units is carried
out concurrently with the load/store operation of data
between a RAM and the register file.
PIPELINING
 Instruction cycle requires Four Phases :
 1.Fetch phase in which the instruction is fetched from the
program memory
 2.Decode phase in which the instruction is decoded

 3.Memory read phase in which the operand required for the


execution of the instruction read from the data memory
 4.Execution phase in which execution as well as the storage
of the results in either on of registers or memory is carried
out
COMMERCIAL PROCESSOR
 The TMS320C5X generation of the Texas instruments
TMS320C50 digital signal processor is fabricated with CMOS
IC technology.
 It is a fixed point, 16 bit processor running at 40 MHz.

 The single instruction execution time is 50 nsec.

 Its architectural design is based on the combination of advanced


Harvard architecture, on-chip peripherals and on-chip memory.
 The functional block diagram of TMS320CX can be divided
into four sub blocks.
 (1) Bus structure,

 (2) Central processing unit,

 (3) On-chip memory and

 (4) On-chip peripherals.


BUS STRUCTURE
 The ‘C5X’ architecture has four buses:
1. Program bus (PB)
2. Program read bus (PRB)
3. Data read bus (DB)
4. Data read address bus (DRB)
 The program bus carries the instruction code and
immediate operands from program memory to the CPU.
 The program address bus provides address to program
memory space for both read and write.
 The data read bus interconnects various elements of the
CPU to data memory space.
 The data read address bus provides the address to access
the data memory space.
CENTRAL PROCESSING UNIT (CPU)

 The ‘C5x CPU consists of these elements:


1) Central arithmetic logic unit (CALU)
2) Parallel logic unit (PLU)
3) Auxiliary register arithmetic unit (ARAU)
4) Memory-mapped registers
5) Program controller
CENTRAL ARITHMETIC LOGIC UNIT (CALU)

 The CPU uses the CALU to perform 2s-complement


arithmetic.
 The CALU consists of these elements:

1) 16 x 16 (bit) multiplier
2) Product register (PREG)
3) 32-bit arithmetic logic unit (ALU)
4) 32-bit accumulator (ACC)
5) 32-bit accumulator buffer (ACCB)
6) Shifters
PARALLEL LOGIC UNIT (PLU)
 The PLU performs Boolean operations or the bit
manipulations required of high-speed controllers.
 The PLU can set, clear, test, or toggle bits in a status
register, control register, or any data memory location.
 The PLU performs logic operation without affecting the
contents of the ACC or PREG.
AUXILIARY REGISTER ARITHMETIC UNIT (ARAU) - USED FOR INDIRECT
ADDRESS CALCULATION

 ARs (16 Bit Registers)


 ARP (Auxiliary Register Pointer)

 16-bit ALU

 ARCR (Auxiliary Register Compare Register)

 AR0-AR7
MEMORY-MAPPED REGISTERS
 The memory-mapped registers are used for indirect data
address pointers, temporary storage, CPU status and
control, or integer arithmetic processing through the
ARAU.
 The C5X has 96 registers mapped into page 0 of the data
memory space.
 All C5X DSPs have:

 28 CPU registers & 16 input/output (I/O) port registers


but have different numbers of peripherals & reserved
registers.
PROGRAM CONTROLLER
 The program controller consists of these elements:
1) Program counter
2) Status and control registers
3) Hardware stack
4) Address generation logic
5) Instruction register
 The program controller contains logic circuitry that
decodes the operational instructions, manages the CPU
pipeline, stores the status of CPU operations, and decodes
the conditional operations.
MEMORY
 On - Chip Memory
1) Program read-only memory (PROM)
2) Data/program dual-access RAM (DARAM)
3) Data/program single-access RAM (SARAM)

 Memory Space
 64K-word program memory space,

 64K-word local data memory space,

 64K-word input/ output ports,

 32K-word global data memory space.


PROGRAM ROM
 Size – (2k- 32k words)
 This memory is used for booting program code from
slower external ROM or EPROM to fast on-chip or
external RAM.
DATA/PROGRAM DUAL-ACCESS RAM :
 All ‘C5x DSPs carry a 1056- word X 16-bit on-chip dual-
access RAM (DARAM).
 The DARAM is divided into three individually selectable
memory blocks:
1) 512-word data or program DARAM block B0,
2) 512-word data DARAM block B1,
3) 32-word data DARAM block B2.
 The DARAM is primarily intended to store data values but,
when needed, can be used to store programs as well.
DATA/PROGRAM SINGLE-ACCESS
RAM :

 All ‘C5x DSPs except the ‘C52 carry a 16-bit on-chip


single-access RAM (SARAM) of various sizes
 Code can be booted from an off-chip ROM and then
executed at full speed, once it is loaded into the on-chip
SARAM.
 The SARAM can be configured by software in one of three
ways:
 All SARAM configured as data memory
 All SARAM configured as program memory
 SARAM configured as both data memory and program memory
ON-CHIP PERIPHERALS
 The ‘C5x DSP on-chip peripherals available are:
1) Clock generator
2) Hardware timer
3) Software-programmable wait-state generators
4) Parallel I/O ports
5) Host port interface (HPI)
6) Serial port
7) Buffered serial port (BSP)
8) Time-division multiplexed (TDM) serial port
9) User-Maskable interrupts
CLOCK GENERATOR
 It consists of an internal crystal oscillator and a phase
locked loop (PLL) circuit which is used to generate the
clock signal for processor
SERIAL PORTS
 Three different kinds of serial ports are available:
1. general purpose serial port,
2. time division multiplexed (TDM) serial port and
3. buffered serial port (BSP).
 Can operate up to 1/4th of the machine cycle.
 Serial communication data can be in either bytes or in
words.
HARDWARE TIMER
 The programmable hardware timer with 4 bit pre scaler
clocks at a rate that is between ½ and 1/32 of the
machine cycle rate
 Three registers namely the timer counter register (TIM),
the timer period register (PRD) and timer control register
(TCR) control and operate the timer.
 The timer can be stopped, restarted, reset or disabled by
specific status bits.
SOFTWARE-PROGRAMMABLE WAIT-
STATE GENERATORS
 Software-programmable wait-state generators are used to
get the result from the low speed external devices when
it is connected to the DSP processor of high speed.
HOST PORT INTERFACE
 It is an 8 bit parallel I/O port that provides an interface to a
host processor.
 Information is exchanged between the DSP and the host
processor through on-chip memory that is accessible to both
the host processor and the C57 .

You might also like