5.dsp UNIT 5 With 8X
5.dsp UNIT 5 With 8X
5.dsp UNIT 5 With 8X
By
Dr.A.Sridevi, M.E , Ph.D
Department of ECE
M.Kumarasamy College of Engineering
1
Course Outcome CO 5: Experiment with DSP
Processors.(K3)
Topics
1.Architectural Features
Harvard, Von-Neumann, VLIW architecture
2. MAC Unit - ALU – Pipelining-
3. Architecture of TMS320C5x
4. Instruction set
5. Addressing Modes
6.Application programs- TMS320C8x Processor.
Content beyond syllabus – DSP implementation in
Embedded system
2
How to classify processors
• Categorized by memory organization
– Von-Neumann architecture
– Harvard architecture
• Categorized by instruction type
– CISC
– RISC
– VLIW
3
Harvard architecture
• Separate memory into 2 types
– Program memory
– Data memory
A d d re s s
P ro c e s s o r D a ta P ro g ra m
Read
W rite
Read
D a ta
4
Harvard Architecture
5
Von-Neumann architecture
• Combine program and data in 1 chunk of
memory
• Example : 80x86 architecture
6
VLIW
• Very Long Instruction Word
• One instruction contains serveral independent
operations that are executed in parallel
LOAD R4,R2
ADD R1,R2 1 Instruction
OR R5,R2
INVERT R7 LOAD ADD OR INVERT
R4,R2 R1,R2 R5,R2 R7
7
M e m o ry
Register file
8
VLIW
9
VLIW
• Instruction level parallelism
• rely on the compiler to determine which
instruction may be executed in parallel
• The number of operations in VLIW instruction
is equal to the number of execution units in
the processor
10
VLIW : Pros and cons
• Advantage
– Simpler and Faster than RISC
• Disadvantage
– Incremental in execution unit=> the program must
be recompiled
11
VLIW
• Widely used in DSP(Digital Signal processing)
applications
– high performance and low cost
• Less successful in general-purpose computer
– customers demand software compatibility
between generations of a processor
12
CISC vs. RISC vs. VLIW
13
Single-Cycle MAC unit
ai xi
Multiplier
a i-1 x i-1
ai xi n
Adder Σ(a ix i )
i=0
ai xi + a i-1 x i-1
Can compute a sum of n-
Register products in n cycles
14
Single Instruction - Multiple Data
(SIMD)
• A technique for data-level parallelism by
employing a number of processing elements
working in parallel
15
Pipelining
• DSPs commonly feature deep pipelines
• TMS320C6x processors have 3 pipeline stages with a
number of phases (cycles):
– Fetch
• Program Address Generate (PG)
• Program Address Send (PS)
• Program ready wait (PW)
• Program receive (PR)
– Decode
• Dispatch (DP)
• Decode (DC)
– Execute
• 6 to 10 phases
16
DSP vs. Microcontroller
• DSP • Microcontroller
– Harvard Architecture – Mostly von Neumann
– VLIW/SIMD (parallel Architecture
execution units) – Single execution unit
– No bit level operations – Flexible bit-level
– Hardware MACs operations
– DSP applications – No hardware MACs
– Control applications
17
DSP Manufacturers
18
Summary of Processors
TI DSP FAMILY:
Fixed point DSPs - `C1X, `C2X, `C2XX, `C5X, `C54X, `C62X, `C64X, C55X
Floating point DSPs - `C3X , `C4X, `C67X
Multiprocessor DSPs - `C8X
`C1X,`C2X,`C2XX,`C5X,`C54X,`C62X,`C64X,
`C55X,`DM64X
• Floating point DSPs
`C3X , `C4X, `C67X,`DM67X
• Multiprocessor DSPs
`C8X
20
Overview of TMS320C5X Family Processors
16 bit fixed point DSP
The `C5X generation consists of `50, `C51, `C52, `C53, `C53S,
`C56, `C57 and `C57S DSPs (Architecture is same for all the
processors the difference is in on-chip memory)
Fabricated using CMOS technology
The architecture design is based on `C25
Execute up to 50 MIPS (Million Instructions Per Second), variable
speed: 50/40/28.6/20 MIPS
Source compatibility with `C1X, `C2X & `C2XX DSPs
Reduced power consumption, two operating voltages 5V and 3.3V
and two power down modes.
224x16 bit maximum addressable memory space ( 64K PM, 64K
DM, 64K I/O and 32 K global)
On-chip peripherals and highly specialized instruction set (Ex.
MAC and MACD)
21
• Characteristics make this family the ideal
choice for a wide range of processing
applications:
• Very flexible instruction set
• Innovative, parallel architectural design
• Inherent operational flexibility
• High-speed performance
• Cost-effectiveness
22
C5x advantages
• Enhanced TMS320 architectural design for increased
performance and versatility.
• Modular architectural design for fast development of
spin-off devices.
• Advanced integrated-circuit processing technology for
increased performance and low power consumption.
• Source code compatibility with ’C1x, ’C2x, and ’C2xx
DSPs for fast and easy Performance upgrades.
• Enhanced instruction set for faster algorithms and for
optimized high-level language operation.
• Reduced power consumption and increased radiation
hardness because of new static design techniques.
23
Major Blocks
• CPU
• ON-CHIP MEMORY
• INTERNAL BUSES
• ON-CHIP PERIPHERALS
24
Block diagram of TMS320C50 Digital Signal Processor
25
BUS STRUCTURE
• High degree of parallelism.
• Four major buses
Program bus (PB)
Program address bus (PAB)
Data read bus (DB)
Data read address bus (DAB)
26
CPU
27
CPU Block
28
Central Arithmetic and Logic Unit (CALU) – 32-bits
Dedicated Unit for processing
If PM = 00 the PREG 32-bit output is not shifted when transferred into the ALU or stored.
If PM = 01 the PREG output is left-shifted 1 bit when transferred into the ALU or stored, and the
LSB is zero filled.
If PM = 10, the PREG output is left-shifted 4 bits when transferred into the ALU or stored, and
the 4 LSBs are zero filled.
If PM = 11, the PREG output is right-shifted 6 bits, sign extended, when transferred into the ALU
or stored, and the 6 LSBs are lost.
29
30
Parallel Logic Unit (PLU)
Dedicated Unit for control operation
• Directly set, clear, test, or toggle multiple bits in a control/status register or
any data memory location.
• Logic operation path to data memory values without affecting the contents of
the ACC or the PREG
• Logic Unit – 16-bits and DBMR Dynamic bit manipulation register
31
Auxiliary Register Arithmetic Unit (ARAU)
Dedicated Unit for Indirect Addressing Mode of Operation
32
Auxiliary Register Arithmetic Unit
33
Memory Mapped Registers (MMREGS)
• The first page of the DATA Memory is mapped to the processor registers
(CPU and Peripheral registers)
• There are 96 such registers in `C5X processor
34
Control Unit
35
Complete CPU Unit
36
Memory
• On-chip Memory
• ROM – Space
• SARAM (Single Access RAM)
• DARAM (Dual Access RAM)
• Off-Chip Memory
• Single Access RAM
Buses
Program Memory Bus
Program Memory Address bus - 16
bits
Program Memory Data bus - 16
bits
Data Memory Bus
Data Memory Address bus - 16
bits
Data Memory Data bus - 16
bits
External Bus
External Address bus - 16 37
On-Chip Memory Details of `C5X Processor
38
On-chip Peripherals
• Clock Generator
• Timer
• Standard Serial Port (One or Two)
• Time Division Multiplexed Serial Port
• Buffered Serial Port
• Parallel I/O Ports
• Host Port Interface (HPI) – Only in `C57 Processor
• Test/Emulation Port (J-Tag)
39
TMS320C5X Assembly Language Instructions
The `C5X processors can address 64K words of program memory and
96K words of data memory. The `C5X processors support the following
Six addressing modes:
Direct addressing
Memory-mapped register addressing
Indirect addressing
Immediate addressing
Dedicated-register addressing
Circular addressing
41
Immediate addressing
The immediate addressing mode can be used to load up to either a 16 bit constant or a
constant of length 13,9 or 7.
Accordingly it is referred to as long immediate or short immediate addressing mode.
The syntax for immediate addressing mode is the symbol # followed by the operand.
Examples
LD #1000h,A ; the operand 1000h is loaded into accumulator A
ADD #10h, B ; the operand 10h is added to the content of accumulator B,
result stored in B
`C5X Instructions That Support Immediate Addressing
42
Direct addressing mode
The data memory of `C5X processors is split into 512 pages and each page with 128 words long.
The data memory page pointer ( DP ) in ST 0 holds the address of the current data memory page.
In the direct addressing mode of C5X, only lower order 7 bits of the address are specified in the
instruction. The upper 9 bits are taken from the data memory page pointer (DP)
Example:
LDP #20h ;the data page pointer (DP) is loaded with 20h, DP points page 32 (i.e. address 1000h)
LD 20h,A ;load data content in location 20h of page 32 (i.e. memory address 1020h) to
accumulator A 43
Indirect addressing
In Indirect addressing mode, the auxiliary registers AR0-AR7 are used for
accessing data.
ARP points to current auxiliary registers.
The contents of ARP can be temporarily stored in the register ARB.
AR used for the addressing to be automatic updatation.
Hence a separate instruction is not required to update the AR.
The contents of an AR can be incremented or decremented
44
Circular Addressing
45
Bit-Reversed Addressing
Bit-reversed addressing mode is used in FFT computation algorithm
In the bit-reversed addressing mode, index register (INDX) specifies one-half the
size of the FFT.
The value contained in the current AR must be equal to 2n–1 , where n is an
integer,
and the FFT size is 2n .
An auxiliary register points to the physical location of a data value. When INDX
is
Exampleadded to the current
for Bit-reversed AR using
addressing mode bit-reversed addressing, addresses are generated
in a bit-reversed fashion.
46
Memory-mapped register addressing
The page 0 in `C5X is used for storing some registers, interrupt vector addresses.
These locations can be accessed by specifying the actual address or by the register name.
(e.g. The auxilary register 0 can be either denoted by the actual memory location(10h) used for
storing its value or by the symbol AR0).
Since these memory locations can be interchangeably used with the register names, the registers
corresponding to page 0 are referred to as memory mapped registers (MMRs).
With memory-mapped register addressing, the memory-mapped registers can be modified
without affecting the current data page pointer value.
The memory-mapped register addressing mode operates like the direct addressing mode, except
that the 9 MSBs of the address are forced to 0 instead of being loaded with the contents of the DP.
This allows the memory-mapped registers of data page 0 directly to be modified without the
overhead of changing the DP or auxiliary register.
`C5X Memory mapped register instructions
LAMM — Load accumulator with memory-mapped register
LMMR — Load memory-mapped register
SAMM — Store accumulator in memory-mapped register
SMMR — Store memory-mapped register 47
Dedicated-Register Addressing
The dedicated-registered addressing mode operates like the long immediate
addressing mode, except that the address comes from one of two special-purpose
memory-mapped registers in the CPU: the block move address register (BMAR)
and the dynamic bit manipulation register (DBMR).
The advantage of this addressing mode is that the address of the block of memory
to
be acted upon can be changed during execution of the program
The syntax for dedicated-register addressing can be stated in one of two ways:
Specify BMAR by its predefined symbol
Exclude the immediate value from a parallel logic unit (PLU) instruction
48
DSP Development Tools
• Simulator
Software Tool packages consists of Assembler, Linker and Debugger
• Starter kit (DSK)
Stand-alone board connected to PC through Serial port or parallel port or
USB port. The tools supplied along with the boards are DSK Assembler,
Linker and Debugger. External supply is provided.
• Evaluation Modules (EVMs)
Stand-alone board which is put on the PCI slot of the PC. No external
Supply required. The tools required are Assembler, Linker and Debugger
49
Tool Flow
Starter Kit (Flow-1)
50
Applications of DSPs
• General Applications
Digital Filtering, Convolution, Correlation, DFTs, FFTs, Adaptive filters,
Waveform Generation etc.
• Speech and Video
Speech recognition, Voice mail, Speech synthesis, Image compression,
Animation, Robot vision etc.
• Instrumentation and control
Disk control, Servo control, Motor control, Robot control etc.
• Telecommunications
Echo cancellation, Digital PBXs, Line repeaters, Digital Modulation and
Demodulation etc.
• Medical
Hearing Aids, Ultrasound equipments, Diagnostic tools etc.
51
Application: FIR Filter on a TMS320C5x
Coefficients
Data
…
LAR AR3, #LASTAP ; Point to oldest sample
RPT #127
MACD COEFFP, *- ; Do the thing
APAC
SACH Y,1 ; Store result -- note shift
52
• initialize data page pointer
• stm #frameSize-1,brc ; compute 256 outputs rptbd
firloop-1 stm #N/2,bk ; FIR circular buffer size ld
*ar6+,b ;
• b mvdd *ar4,*a5+0% ;
• move old x[n-N/2]
• stl b,*ar4%
• add *a4+0%,*a5+0%,a ;
• a = x[n] + x[n-N/2-1] rptz b,#(N/2-1) ;
• b, do N/2-1
• firs *ar4+0%,*ar5+0%,coeffs ;
• b += a * h[i], *+a4(2)% ; firloop: ret
53
TMS320C8x
• Single chip multiprocessor digital signal
processor (DSP) devices.
• Applications image processing, two-
dimensional, three-dimensional, and virtual
reality graphics, digital audio and video
compression, and telecommunications.
54
• Multiply-intensive, pixel, and bit field
processing needs.
• High-precision operations and floating-point
computations.
• High data bandwidth and effective
interprocessor communication.
• With in-circuit emulation, allows to control
and monitor the execution of each of the
processors.
• The TMS320C8x allows to test all registers and
latches.
55
56
• Master processor (MP)- 32-bit RISC processor
with an IEEE-754 floating-point unit (FPU).
• Parallel processors -32-bit integer units.
• Test access port (TAP)- internal emulation and
boundary-scan paths.
• Transfer controller (TC)-64-bit port.
• Local port (L)- A 32-bit local port on each PP
provides access to on-chip SRAM data that is
local to the PP.
• Global port (G). A 32-bit global port on each PP
provides access to all on-chip shared SRAM data
57
• Instruction port (I). Instruction ports on the
PPs and on the MP provide access to on-chip
instruction caches:
• Cache or data port (C/D). A 64-bit cache or
data port on the MP provides access to the
data cache or on-chip SRAMs.
• On-chip register port (OCR). The MP accesses
the memory-mapped TC and VC (’C80 only)
registers through a 32-bit port.
• Crossbar. The on-chip processors use crossbar
switching to access on-chip RAM..
58
C8X Features
• ’C8x TC, has four main Features
• TC processes source
– Destination addresses with independent
controllers: the source controller and the
destination controller
• The packet transfer FIFO (first-in, first-out
logic).
• A separate cache controller uses the cache
buffer to buffer incoming data.
• The request queuing and prioritization logic
59
C8X – Multiprocessor
• Memory Interfacing
• Dynamic Bus Sizing
• Packet Transfers
• Externally-Initiated Packet Transfers (XPTs)
• Video Controller (VC)
60
Application in Speech processing
• Speech is the most natural form of human-human
communications.
• Speech is related to language; linguistics is a branch of social
science.
• Speech is also related to sound and acoustics, a branch of
physical science.
• Therefore, speech is one of the most intriguing signals that
humans work with every day.
• Purpose of speech processing:
– To understand speech as a means of communication; – To represent
speech for transmission and reproduction;
– To analyze speech for automatic recognition and extraction of
information
– To discover some physiological characteristics of the talker.
61
62
Speech Coding
• Speech Coding is the process of transforming a speech signal
into a representation for efficient transmission and storage of
speech
– narrowband and broadband wired telephony
– cellular communications
– Voice over IP (VoIP) to utilize the Internet as a real-time
communications medium
– secure voice for privacy and encryption for national security
applications
– extremely narrowband communications channels, e.g.,
battlefield applications using HF radio
– storage of speech for telephone answering machines, IVR
systems, prerecorded messages
63
64
65
Speech Synthesis
• Synthesis of Speech is the process of generating a
speech signal using computational means for effective
humanmachine interactions
– machine reading of text or email messages
– telematics feedback in automobiles
– talking agents for automatic transactions
– automatic agent in customer care call center
– handheld devices such as foreign language phrasebooks,
dictionaries, crossword puzzle helpers
– announcement machines that provide information such
as stock quotes, airlines schedules, weather reports, etc.
66
67
Speech Recognition and Understanding
68
Thank You
69