DSP Architecture
DSP Architecture
DSP Architecture
TMS320C54x DSP
processor
Presented by:
Outline
Introduction
Architecture
Applications
features
Instruction Set and addressing
FIR Filtering
Accelerating Polynomial Evaluation
Numerical Issues
Write code in C
Conclusion
Introduction
[2]
TMS320C54x
a fixed-point digital signal processor (DSP) in the TMS320 family.
Low power DSP
: 0.54 mW/MIP
Acceleration for FIR and LMS filtering, code book search,
[4]
Software Applications
Circular Buffers
Single-Instruction Repeat (RPT) Loops
Extended-Precision Arithmetic
Floating-Point Arithmetic
Application-Oriented Operations
Symmetric FIR Filters
Adaptive Filtering
Viterbi Algorithm for Channel Decoding
Operand is part of
the instruction
ADD #0FFh
Absolute
Address of operand
is part of the
instruction
Register
Operand is
specified in a
register
LD *(LABEL), A
READA DATA
;(data read
from address in
accumulator A)
ADD 010h,A
Indirect
Address of operand is
stored in a register
Offset addressing
Register offset (ar1+ar0)
Autoincrement/decrement
Bit reversed addressing
Circular addressing
ADD *AR1
ADD *AR1(10)
ADD *AR1+0
ADD *AR1+
ADD *AR1+B
ADD *AR1+0B
Logical
AND
BIT
BITF
CMPL
CMPM
OR
ROL
ROR
SFTA
SFTC
SFTL
XOR
Program
Control
B
BC
CALL
CC
IDLE
INTR
NOP
RC
RET
RPT
RPTB
RPTZ
TRAP
XC
Application
Specific
ABS
ABDST
DELAY
EXP
FIRS
LMS
MAX
Data
MIN
Management
NORM
LD
POLY
MAR
RND
MV(D,K,M,P)
SAT
ST
SQDST
SQUR
Notes
SQURA
CMPL complement
MAR modify address reg.
SQURS
CMPM compare memory
MAS multiply and subtract
; Addresses:
a4 as
h, linear
a5 N samples
a6 input (in
buffer,
a7mem.)
output
h stored
array ofofNx,elements
prog.
buffer
x stored
as circular
array
of N
data
mem.)
; Modulo
addressing
prevents
need
to elements
reinitialize(in
regs
each
sample
; Moving filter coefficients from program to data memory is not
shown
firtask: ld
#firDP,dp
; initialize data page
pointer
stm
#frameSize-1,brc
; compute 256 outputs
rptbd firloop-1
stm
#N,bk
; FIR circular buffer size
ld
*ar6+,a
; load input value to
accumulator b
stl
a,*ar4+%
; replace oldest sample
with newest
rptz
a,#(N-1)
; zero accumulator a, do
N taps
mac
*ar4+0%,*ar5+0%,a; one tap, accumulate in a
symmetric or anti-symmetric
Symmetric coefficients using 2 mults 3 adds
y[n] = h0 x[n] + h1 x[n-1] + h1 x[n-2] + h0 x[n-3]
y[n] = h0 (x[n] + x[n-3]) + h1 (x[n-1] + x[n-2])
Accelerated by FIRS (FIR Symmetric) instruction
x in two
circular
buffers
h in
program
memory
Architecture - FIRS
y(x) = c0 + c1 x + c2 x2 + c3 x3
Expanded form
y(x) = c0 + x (c1 + x (c2 + x (c3))) Horners form
POLY reduces 2 N cycles using MAC+ADD to N cycles
; ar2 contains address of array [c3 c2 c1 c0]
; poly uses temporary register t for multiplicand x
; first two times poly instruction executes gives
; 1. a = c(3) + x * 0 = c(3); b = c2
; 2. a = c(2) + x * c(3);
b = c1
ld *ar2+,16,b
; b = c3 << 16
ld *ar3,t
; t = x (ar3 contains addr of x)
rptz a,#3
; a = 0, repeat next inst. 4
times
poly *ar2+
; a = b + x*a || b = c(i-1) << 16
sth a,*ar4
; store result (ar4 is addr of y)
Integer Multiplication
Integer multiplication yields products larger than the inputs, as
Does the user store the lower (1) or upper (8) result?
Fractional Multiplication
Multiplication of fractions yields products that never exceed
Accumulation
With fractions, we were able to guarantee that
Saturation (SAT)
SAT instruction saturates value exceeding
32-bit range in the selectedSAT
accumulator:
A
SAT B
Non-gain Systems
Division
The C54x does not have a single cycle 16-bit divide
instruction
Divide is a rare function in DSP
Division hardware is expensive
The C54x does have a single cycle 1-bit divide
instruction: conditional subtract or SUBC
Division Routine
B = num*den (tells sign)
Strip sign of numerator
Strip sign of denominator
16 iterations
1-bit divide
If result needs to be
negative
Invert sign
Store negative result
Rounding
Result of multiplication can be rounded for MPY,
and MAS operations. This is specified by appending the
Write code in C
Inline Assembly
Allows direct access to assembly language from C
Useful for operating on components not used by
C, ex:
from C
main C file retains portability
yields more easily maintained structures
eliminates risk of interfering with registers in use by C
Registers:
Read
and write to the register as any other pointer:
*SPC_REG=OxC
8;
ioport unsigned
port8000
x = port8000;
port8000 = y;
References
[1] Texas instrument TMS320C54x DSP Design
Workshop
May 1997
[2] TMS320C54x Users guide
[3] www.ti.com
[4] SIGNAL AND IMAGE PROCESSING ON THE
TMS320C54x DSP by Prof. Brian L. Evans
[5] TMS320C54x Assembly Language Tools