5.dsp UNIT 5 With 8X

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 69

UNIT V

DIGITAL SIGNAL PROCESSORS

By
Dr.A.Sridevi, M.E , Ph.D
Department of ECE
M.Kumarasamy College of Engineering
1
Course Outcome CO 5: Experiment with DSP
Processors.(K3)
Topics
1.Architectural Features
Harvard, Von-Neumann, VLIW architecture
2. MAC Unit - ALU – Pipelining-
3. Architecture of TMS320C5x
4. Instruction set
5. Addressing Modes
6.Application programs- TMS320C8x Processor.
Content beyond syllabus – DSP implementation in
Embedded system
2
How to classify processors
• Categorized by memory organization
– Von-Neumann architecture
– Harvard architecture
• Categorized by instruction type
– CISC
– RISC
– VLIW

3
Harvard architecture
• Separate memory into 2 types
– Program memory
– Data memory
A d d re s s

P ro c e s s o r D a ta P ro g ra m
Read

W rite

Read

D a ta

4
Harvard Architecture

5
Von-Neumann architecture
• Combine program and data in 1 chunk of
memory
• Example : 80x86 architecture

6
VLIW
• Very Long Instruction Word
• One instruction contains serveral independent
operations that are executed in parallel
LOAD R4,R2
ADD R1,R2 1 Instruction
OR R5,R2
INVERT R7 LOAD ADD OR INVERT
R4,R2 R1,R2 R5,R2 R7

7
M e m o ry

Operation1 Operation2 Operation3 Operation4


VLIW Processor
P ro ce sso r

Register file
8
VLIW

9
VLIW
• Instruction level parallelism
• rely on the compiler to determine which
instruction may be executed in parallel
• The number of operations in VLIW instruction
is equal to the number of execution units in
the processor

10
VLIW : Pros and cons
• Advantage
– Simpler and Faster than RISC
• Disadvantage
– Incremental in execution unit=> the program must
be recompiled

11
VLIW
• Widely used in DSP(Digital Signal processing)
applications
– high performance and low cost
• Less successful in general-purpose computer
– customers demand software compatibility
between generations of a processor

12
CISC vs. RISC vs. VLIW

13
Single-Cycle MAC unit
ai xi

Multiplier
a i-1 x i-1

ai xi n
Adder Σ(a ix i )
i=0
ai xi + a i-1 x i-1
Can compute a sum of n-
Register products in n cycles

14
Single Instruction - Multiple Data
(SIMD)
• A technique for data-level parallelism by
employing a number of processing elements
working in parallel

15
Pipelining
• DSPs commonly feature deep pipelines
• TMS320C6x processors have 3 pipeline stages with a
number of phases (cycles):
– Fetch
• Program Address Generate (PG)
• Program Address Send (PS)
• Program ready wait (PW)
• Program receive (PR)
– Decode
• Dispatch (DP)
• Decode (DC)
– Execute
• 6 to 10 phases

16
DSP vs. Microcontroller
• DSP • Microcontroller
– Harvard Architecture – Mostly von Neumann
– VLIW/SIMD (parallel Architecture
execution units) – Single execution unit
– No bit level operations – Flexible bit-level
– Hardware MACs operations
– DSP applications – No hardware MACs
– Control applications

17
DSP Manufacturers

TEXAS INSTRUMENTS (TI)


MOTOROLA
ANALOG DEVICES and (ADSP)

18
Summary of Processors
TI DSP FAMILY:

Fixed point DSPs - `C1X, `C2X, `C2XX, `C5X, `C54X, `C62X, `C64X, C55X
Floating point DSPs - `C3X , `C4X, `C67X
Multiprocessor DSPs - `C8X

ANALOG DEVICES DSP FAMILY:

SHARC Processors - ADSP210X, ADSP211X, ADSP212X, ADSP212X


ADSP -16 bit Processors - ADSP218X, ADSP219X
Mixed Signal DSPs - ADSP2199x
Tiger SHARC Processors - TS10X, TS20X
Blackfin Processors - ADSP BF53X
MOTOROLA DSP FAMILY:

Fixed point DSPs - DSP561XX, DSP3XX, DSP566xx


Floating point DSPs - DSP96XXX
Multi-Star Core DSPs - MSC711X, MSC81XX
DSP & Controllers - DSP568XX 19
TI DSP FAMILY

• Fixed point DSPs

`C1X,`C2X,`C2XX,`C5X,`C54X,`C62X,`C64X,
`C55X,`DM64X
• Floating point DSPs
`C3X , `C4X, `C67X,`DM67X
• Multiprocessor DSPs
`C8X

20
Overview of TMS320C5X Family Processors
 16 bit fixed point DSP
 The `C5X generation consists of `50, `C51, `C52, `C53, `C53S,
`C56, `C57 and `C57S DSPs (Architecture is same for all the
processors the difference is in on-chip memory)
 Fabricated using CMOS technology
 The architecture design is based on `C25
 Execute up to 50 MIPS (Million Instructions Per Second), variable
speed: 50/40/28.6/20 MIPS
 Source compatibility with `C1X, `C2X & `C2XX DSPs
 Reduced power consumption, two operating voltages 5V and 3.3V
and two power down modes.
 224x16 bit maximum addressable memory space ( 64K PM, 64K
DM, 64K I/O and 32 K global)
 On-chip peripherals and highly specialized instruction set (Ex.
MAC and MACD)
21
• Characteristics make this family the ideal
choice for a wide range of processing
applications:
• Very flexible instruction set
• Innovative, parallel architectural design
• Inherent operational flexibility
• High-speed performance
• Cost-effectiveness

22
C5x advantages
• Enhanced TMS320 architectural design for increased
performance and versatility.
• Modular architectural design for fast development of
spin-off devices.
• Advanced integrated-circuit processing technology for
increased performance and low power consumption.
• Source code compatibility with ’C1x, ’C2x, and ’C2xx
DSPs for fast and easy Performance upgrades.
• Enhanced instruction set for faster algorithms and for
optimized high-level language operation.
• Reduced power consumption and increased radiation
hardness because of new static design techniques.

23
Major Blocks

• CPU
• ON-CHIP MEMORY
• INTERNAL BUSES
• ON-CHIP PERIPHERALS

24
Block diagram of TMS320C50 Digital Signal Processor

25
BUS STRUCTURE
• High degree of parallelism.
• Four major buses
Program bus (PB)
Program address bus (PAB)
Data read bus (DB)
Data read address bus (DAB)

26
CPU

1. Central Arithmetic and Logic Unit (CALU)


2. Parallel Logic Unit (PLU)
3. Auxiliary Register Arithmetic Unit (ARAU)
4. Memory Mapped Registers (MMREGS)
5. Control Unit

27
CPU Block

28
Central Arithmetic and Logic Unit (CALU) – 32-bits
Dedicated Unit for processing

• Multiplier – (16x16) Parallel Multiplier


• Arithmetic and Logic unit (ALU) – 32-bit ALU
• Accumulator – 32-bits
• ACC buffer – 32-bits
• Shifters (Scalers) – 16-bit Barrel Shift Registers

FOUR PRODUCT MODES

If PM = 00 the PREG 32-bit output is not shifted when transferred into the ALU or stored.

If PM = 01 the PREG output is left-shifted 1 bit when transferred into the ALU or stored, and the
LSB is zero filled.

If PM = 10, the PREG output is left-shifted 4 bits when transferred into the ALU or stored, and
the 4 LSBs are zero filled.

If PM = 11, the PREG output is right-shifted 6 bits, sign extended, when transferred into the ALU
or stored, and the 6 LSBs are lost.

29
30
Parallel Logic Unit (PLU)
Dedicated Unit for control operation
• Directly set, clear, test, or toggle multiple bits in a control/status register or
any data memory location.
• Logic operation path to data memory values without affecting the contents of
the ACC or the PREG
• Logic Unit – 16-bits and DBMR Dynamic bit manipulation register

31
Auxiliary Register Arithmetic Unit (ARAU)
Dedicated Unit for Indirect Addressing Mode of Operation

• Arithmetic Unit – 16-bits


• Auxiliary Registers (ARs) – 8 Registers (AR0-AR7) – 16-bits
• Index Register (INDX) – One – 16bits
• Circular Buffers – Two – With Five Registers (CBCR,CBSR1,CBSR2, CBER1
and CBER2)
• Auxiliary Register Compare Register (ARCR) – One – 16bits

• TOTAL REGISTER ARAU – 15 Numbers

32
Auxiliary Register Arithmetic Unit

33
Memory Mapped Registers (MMREGS)

• The first page of the DATA Memory is mapped to the processor registers
(CPU and Peripheral registers)
• There are 96 such registers in `C5X processor
34
Control Unit

35
Complete CPU Unit

36
Memory
• On-chip Memory
• ROM – Space
• SARAM (Single Access RAM)
• DARAM (Dual Access RAM)
• Off-Chip Memory
• Single Access RAM

Buses
Program Memory Bus
Program Memory Address bus - 16
bits
Program Memory Data bus - 16
bits
Data Memory Bus
Data Memory Address bus - 16
bits
Data Memory Data bus - 16
bits
External Bus
External Address bus - 16 37
On-Chip Memory Details of `C5X Processor

38
On-chip Peripherals
• Clock Generator
• Timer
• Standard Serial Port (One or Two)
• Time Division Multiplexed Serial Port
• Buffered Serial Port
• Parallel I/O Ports
• Host Port Interface (HPI) – Only in `C57 Processor
• Test/Emulation Port (J-Tag)

39
TMS320C5X Assembly Language Instructions

Assembly language syntax

 The C5X assembler assumes the following assembly language syntax .


 A source statement contains four ordered fields label, mnemonic, operand list and comments.
 The general syntax for source statements is given below.

[ label ] [:] mnemonic [ operand list ][;comment ]

Guidelines for writing `C5X assembly language program:


 All statements must begin with a label, a blank, an asterisk, or a semicolon.
 Labels are optional; if used, they must begin in column 1.
 Labels may be placed either before the instruction mnemonic on the same line
or on the preceding line in the first column.
 One or more blanks must separate each field. Tab characters are equivalent to blanks.
 Comments are optional. Comments that begin in column 1 can begin with an asterisk
or a semicolon (* or ;), but comments that begin in any other column must begin with
a semicolon.
40
`C5X Addressing modes

The `C5X processors can address 64K words of program memory and
96K words of data memory. The `C5X processors support the following
Six addressing modes:
 Direct addressing
 Memory-mapped register addressing
 Indirect addressing
 Immediate addressing
 Dedicated-register addressing
 Circular addressing

41
Immediate addressing
 The immediate addressing mode can be used to load up to either a 16 bit constant or a
constant of length 13,9 or 7.
 Accordingly it is referred to as long immediate or short immediate addressing mode.
 The syntax for immediate addressing mode is the symbol # followed by the operand.
Examples
LD #1000h,A ; the operand 1000h is loaded into accumulator A
ADD #10h, B ; the operand 10h is added to the content of accumulator B,
result stored in B
`C5X Instructions That Support Immediate Addressing

42
Direct addressing mode
 The data memory of `C5X processors is split into 512 pages and each page with 128 words long.
 The data memory page pointer ( DP ) in ST 0 holds the address of the current data memory page.
 In the direct addressing mode of C5X, only lower order 7 bits of the address are specified in the
instruction. The upper 9 bits are taken from the data memory page pointer (DP)

Example:
LDP #20h ;the data page pointer (DP) is loaded with 20h, DP points page 32 (i.e. address 1000h)
LD 20h,A ;load data content in location 20h of page 32 (i.e. memory address 1020h) to
accumulator A 43
Indirect addressing
 In Indirect addressing mode, the auxiliary registers AR0-AR7 are used for
accessing data.
 ARP points to current auxiliary registers.
 The contents of ARP can be temporarily stored in the register ARB.
 AR used for the addressing to be automatic updatation.
 Hence a separate instruction is not required to update the AR.
 The contents of an AR can be incremented or decremented

`C5X Indirect addressing mode options for address displacement

44
Circular Addressing

 Algorithms such as convolution, correlation, and finite impulse response (FIR)


filters
use circular buffers in memory to implement a sliding window, which contains the
most recent data to be processed.
 The ’C5x supports two concurrent circular buffers operating via the ARs.
The five memory-mapped registers to control the circular buffer operation:
 CBSR1 — Circular buffer 1 start register
 CBSR2 — Circular buffer 2 start register
 CBER1 — Circular buffer 1 end register
 CBER2 — Circular buffer 2 end register
 CBCR — Circular buffer control register
 The 8-bit CBCR enables and disables the circular buffer operation.
Steps to define circular buffer:
 The start and end addresses are loaded into the corresponding buffer registers first.
 Next, a value between the start and end registers for the circular buffer is loaded
into an AR.
 The corresponding circular buffer enable bit in the CBCR should be set.

45
Bit-Reversed Addressing
 Bit-reversed addressing mode is used in FFT computation algorithm
 In the bit-reversed addressing mode, index register (INDX) specifies one-half the
size of the FFT.
 The value contained in the current AR must be equal to 2n–1 , where n is an
integer,
and the FFT size is 2n .
 An auxiliary register points to the physical location of a data value. When INDX
is
Exampleadded to the current
for Bit-reversed AR using
addressing mode bit-reversed addressing, addresses are generated

in a bit-reversed fashion.

46
Memory-mapped register addressing
 The page 0 in `C5X is used for storing some registers, interrupt vector addresses.
 These locations can be accessed by specifying the actual address or by the register name.
 (e.g. The auxilary register 0 can be either denoted by the actual memory location(10h) used for
storing its value or by the symbol AR0).
 Since these memory locations can be interchangeably used with the register names, the registers
corresponding to page 0 are referred to as memory mapped registers (MMRs).
 With memory-mapped register addressing, the memory-mapped registers can be modified
without affecting the current data page pointer value.
 The memory-mapped register addressing mode operates like the direct addressing mode, except
that the 9 MSBs of the address are forced to 0 instead of being loaded with the contents of the DP.
 This allows the memory-mapped registers of data page 0 directly to be modified without the
overhead of changing the DP or auxiliary register.
`C5X Memory mapped register instructions
 LAMM — Load accumulator with memory-mapped register
 LMMR — Load memory-mapped register
 SAMM — Store accumulator in memory-mapped register
 SMMR — Store memory-mapped register 47
Dedicated-Register Addressing
 The dedicated-registered addressing mode operates like the long immediate
addressing mode, except that the address comes from one of two special-purpose
memory-mapped registers in the CPU: the block move address register (BMAR)
and the dynamic bit manipulation register (DBMR).
 The advantage of this addressing mode is that the address of the block of memory
to
be acted upon can be changed during execution of the program
 The syntax for dedicated-register addressing can be stated in one of two ways:
 Specify BMAR by its predefined symbol
 Exclude the immediate value from a parallel logic unit (PLU) instruction

48
DSP Development Tools
• Simulator
Software Tool packages consists of Assembler, Linker and Debugger
• Starter kit (DSK)
Stand-alone board connected to PC through Serial port or parallel port or
USB port. The tools supplied along with the boards are DSK Assembler,
Linker and Debugger. External supply is provided.
• Evaluation Modules (EVMs)
Stand-alone board which is put on the PCI slot of the PC. No external
Supply required. The tools required are Assembler, Linker and Debugger

The New Tool for all TI processors


• Code Composer Studio (CCS)
Integrated Development Environment (IDE). This tools supports code
development in Assembly language, C-Language or any other signal
processing platform like MATLAB etc. This tool has Compiler, Assembler,
Linker and Debugger.

49
Tool Flow
Starter Kit (Flow-1)

Assembly File Assembler & Executable Files


.asm Linker .dsk

Executable Files Debugger Starter Kit


.dsk Target DSP

(Flow-2) for CCS


Assembler Executable Files
Assembly or C Files
.obj
.asm & .C .out
Linker

Executable Files Debugger Starter Kit


.out Target DSP

50
Applications of DSPs
• General Applications
Digital Filtering, Convolution, Correlation, DFTs, FFTs, Adaptive filters,
Waveform Generation etc.
• Speech and Video
Speech recognition, Voice mail, Speech synthesis, Image compression,
Animation, Robot vision etc.
• Instrumentation and control
Disk control, Servo control, Motor control, Robot control etc.
• Telecommunications
Echo cancellation, Digital PBXs, Line repeaters, Digital Modulation and
Demodulation etc.
• Medical
Hearing Aids, Ultrasound equipments, Diagnostic tools etc.

51
Application: FIR Filter on a TMS320C5x
Coefficients

Data

COEFFP .set 02000h ; Program mem address


X .set 037Fh ; Newest data sample
LASTAP .set 037FH ; Oldest data sample


LAR AR3, #LASTAP ; Point to oldest sample
RPT #127
MACD COEFFP, *- ; Do the thing
APAC
SACH Y,1 ; Store result -- note shift

52
• initialize data page pointer
• stm #frameSize-1,brc ; compute 256 outputs rptbd
firloop-1 stm #N/2,bk ; FIR circular buffer size ld
*ar6+,b ;
• b mvdd *ar4,*a5+0% ;
• move old x[n-N/2]
• stl b,*ar4%
• add *a4+0%,*a5+0%,a ;
• a = x[n] + x[n-N/2-1] rptz b,#(N/2-1) ;
• b, do N/2-1
• firs *ar4+0%,*ar5+0%,coeffs ;
• b += a * h[i], *+a4(2)% ; firloop: ret
53
TMS320C8x
• Single chip multiprocessor digital signal
processor (DSP) devices.
• Applications image processing, two-
dimensional, three-dimensional, and virtual
reality graphics, digital audio and video
compression, and telecommunications.

54
• Multiply-intensive, pixel, and bit field
processing needs.
• High-precision operations and floating-point
computations.
• High data bandwidth and effective
interprocessor communication.
• With in-circuit emulation, allows to control
and monitor the execution of each of the
processors.
• The TMS320C8x allows to test all registers and
latches.
55
56
• Master processor (MP)- 32-bit RISC processor
with an IEEE-754 floating-point unit (FPU).
• Parallel processors -32-bit integer units.
• Test access port (TAP)- internal emulation and
boundary-scan paths.
• Transfer controller (TC)-64-bit port.
• Local port (L)- A 32-bit local port on each PP
provides access to on-chip SRAM data that is
local to the PP.
• Global port (G). A 32-bit global port on each PP
provides access to all on-chip shared SRAM data
57
• Instruction port (I). Instruction ports on the
PPs and on the MP provide access to on-chip
instruction caches:
• Cache or data port (C/D). A 64-bit cache or
data port on the MP provides access to the
data cache or on-chip SRAMs.
• On-chip register port (OCR). The MP accesses
the memory-mapped TC and VC (’C80 only)
registers through a 32-bit port.
• Crossbar. The on-chip processors use crossbar
switching to access on-chip RAM..
58
C8X Features
• ’C8x TC, has four main Features
• TC processes source
– Destination addresses with independent
controllers: the source controller and the
destination controller
• The packet transfer FIFO (first-in, first-out
logic).
• A separate cache controller uses the cache
buffer to buffer incoming data.
• The request queuing and prioritization logic
59
C8X – Multiprocessor
• Memory Interfacing
• Dynamic Bus Sizing
• Packet Transfers
• Externally-Initiated Packet Transfers (XPTs)
• Video Controller (VC)

60
Application in Speech processing
• Speech is the most natural form of human-human
communications.
• Speech is related to language; linguistics is a branch of social
science.
• Speech is also related to sound and acoustics, a branch of
physical science.
• Therefore, speech is one of the most intriguing signals that
humans work with every day.
• Purpose of speech processing:
– To understand speech as a means of communication; – To represent
speech for transmission and reproduction;
– To analyze speech for automatic recognition and extraction of
information
– To discover some physiological characteristics of the talker.
61
62
Speech Coding
• Speech Coding is the process of transforming a speech signal
into a representation for efficient transmission and storage of
speech
– narrowband and broadband wired telephony
– cellular communications
– Voice over IP (VoIP) to utilize the Internet as a real-time
communications medium
– secure voice for privacy and encryption for national security
applications
– extremely narrowband communications channels, e.g.,
battlefield applications using HF radio
– storage of speech for telephone answering machines, IVR
systems, prerecorded messages
63
64
65
Speech Synthesis
• Synthesis of Speech is the process of generating a
speech signal using computational means for effective
humanmachine interactions
– machine reading of text or email messages
– telematics feedback in automobiles
– talking agents for automatic transactions
– automatic agent in customer care call center
– handheld devices such as foreign language phrasebooks,
dictionaries, crossword puzzle helpers
– announcement machines that provide information such
as stock quotes, airlines schedules, weather reports, etc.
66
67
Speech Recognition and Understanding

• Recognition and Understanding of Speech is the process of extracting


usable linguistic information from a speech signal in support of human
-machine communication by voice
– command and control (C&C) applications,
e.g., simple commands for spreadsheets, presentation graphics,
appliances
– voice dictation to create letters, memos, and other documents
– natural language voice dialogues with machines to enable
Help desks, Call Centers
– voice dialing for cellphones and from PDA’s and other small
devices
– agent services such as calendar entry and update, address list
modification and entry, etc

68
Thank You

69

You might also like