Index Mappings For The Fast Fourier Transform: IEEE Transactions On Signal Processing April 1996

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3316018

Index mappings for the fast Fourier transform

Article  in  IEEE Transactions on Signal Processing · April 1996


DOI: 10.1109/78.489047 · Source: IEEE Xplore

CITATIONS READS

12 984

1 author:

James Schatzman
N-ask Inc
7 PUBLICATIONS   128 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

DSP Basic Research View project

Seismology View project

All content following this page was uploaded by James Schatzman on 09 December 2015.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 3, MARCH 1996 717

polynomial," Trans. Inst. Elec. Commun. Eng. (in Japanese), vol. J68-A, TABLE I
pp. 173-180, 1985. SUMMATTON KERNEL FOR COMBINATIONS OF POPULAR MAPS.HERE
Z. Wang, "Pruning the fast discrete cosine transform," IEEE Trans. , P: = ( P ; ~ mod PZ) AND PL = (P;' mod PI)
Commun., vol. 39, pp. 640-643, May 1991.
S. B. Narayanan and K. M. M. Prabhu, "Fast Hartley transform pruning," Output Map
IEEE Trans. Signal Processing, vol. 39, pp. 230-233, Jan. 1991. Input

Output KI =PZ Ki=pz KI=P;PZ

Kz =PI K2=l Kz=P~ KZ=P:PI

Index Mappings for the Fast Fourier Transform


James C. Schatzman

Abstract- General analysis shows which linear index maps for the
multidimensional PFA avoid twiddle factors. For any input map of
this class, there is a unique output map which omits twiddle factors
and modifications of the underlying DFT subblocks. With subblock
modifications, all factorizations can be done in-place and in-order.

I. INTRODUCTION
The prime factor algorithm fast Fourier transform is based on the
reduction of a 1-D discrete fourier tranform to a multidimensional
DFT. Many have studied the underlying index mapping schemes:
where N = p1p2 for arbitrary natural numbers p l and p 2 . For the
Burrus (1977), Burrus and Eschenbacher (1981), Rothweiller (1982),
Bums and Parks (1985), Temperton (1985), Siu and Wong (1989), n * ( n 1 , n ~ map
) to be one-to-one requires (Burrus (1977)) the
Lun et al. (1989), Chan and Ho (1991), Johnsson et al. (1991), following.
and Temperton (1992). Temperton (1985) and (1992) and Burrus CaseA: gcd(p1,pZ) = 1:
and Eschenbacher (1981) describe in-place in-order FFT algorithms. Al) NI = a l p z , NZ # U Z ~ I , gcd(a1,pl) = 1, and
Steidl and Tasche (1989) and Lun and Siu (1993) give in-place and gcd(N2,pz) = 1; or
in-order algorithms for special factorizations. Linzer and Feig (1993) A21 NI # a m , NZ = anpi, gcd(N1,pl) = 1, and
show that incorporating direct permutation steps into the first or last gcd(az,pz) = 1; or
stage of the FFT does not degrade performance. Burrus (1977) and A3) NI = a1p2, NZ = a m , gcd(a1,pl) = 1, and
Lun and Siu (1993) give the clearest and most useful analyses of gcd(az,pz) = 1
linear index maps. This work builds on these results to obtain a
Case B: gcd(p1,pZ) > 1:
comprehensive theory.
B l ) NI = a m , NZ # azpl, gcd(a1,pl) = 1, and
gcd(Nz,p z ) = 1
11. TWO-DIMENSIONAL
FORMULATION
B2) NI # alpz, NZ = azpl, gcd(N1,pI) = 1, and
The discrete Fourier transform of length N may be written gcd(az,pz) = 1.
where a1 and a2 are integers. Particular maps of importance are
NI = 1, NZ = pl ("Trivial map," Cases A2/B2); NI = p 2 , Nz = 1
n=O ("Digit-reversing map," Cases Al/Bl); a1 = 1, a2 = 1, ("Ruritanian
map," Case A3); a1 = pT1 mod p l , a2 = pT1 mod p ~ ("Chinese
,
and the goal is to simplify the computation by reordering the input
Remainder Theorem (CRT) map," Case A3). Mapping xn to 2nl,n2
and output and by exploiting the redundancy of the transform. For
two factors (N = p l p z ) , the input index n and the output index ik and ck to 2kl,k2, (1) becomes
are generally mapped to two dimensions according to a certain linear
form modulo N, as follows:

Manuscript received August 29, 1994; revised May 31, 1995. The associate Considering all possible combinations of input and output maps
editor coordinating the review of this work and approving it for publication from these special cases, we obtain the results in Table I.
was Dr. K. J. Ray Liu. Using matrix notation, c' = F Z becomes c' = PG'EPrZ where
J. C. Schatzman is with the University of Wyoming, Laramie, WY 82071
USA (e-mail:[email protected]). PI and PO are the input/output map permutation matrices and E is
Publisher Item Identifier S 1053-587X(96)02384-5. the matrix from the Table II. In Table 11, T is the N by N diagonal

1053-587X/96$05.00 0 1996 IEEE


718 E E E TRANSACTIONS ON SIGNAL PROCESSrNG, VOL 44, NO. 3, MARCH 1996

TABLE I1 twiddle factors must belong to case A3 (easily shown). If the n and
MAPPEDDISCRETE
FOURIER TRANSFORM WRI’MENIN k maps both belong to Case A3, with corresponding integers a1 , uz
MATRIXNOTATIONFOR COMBINATIONS OF POPULAR ) b l , bz for k * ( k l , k z ) ,we obtain
for n tf ( n l , n ~and
MAPS.@ DENOTES
THE KRONECKER MATRIXmtooucr
Output Map

Input Trivial Digit-Reversing Ruritanian CHT

Output

nz=O n1=0
TrWlal

NI = 1 which amounts to the 2-D transformation Fa:2b2p11 @ FLY1b1P21.


N2 = P I
The case of identical input and output maps is of special interest.
Digit-Reversing
As pointed out by many authors, a straightforward in-place in-
order algorithm results. Although the algorithm requires complicated
NI = PZ
addressing, as Lun and Siu (1993) point out, the addressing can be
Nz = 1 accomplished by indirect addressing or in hardware “using a circular
Ruritanian buffer.” Unfortunately, indirect addressing is inefficient on most high-
NI= P Z speed general purpose computers because of the increased memory
references required. On many processors, therefore, this theoretical
Nz = P I
property of being “in-place and in-order” may be of little or no
CRT
advantage; explicit mapping is often preferable. Even so, we note
NI = P ~ P Z that it is possible for identical input and output maps to give a result
NZ = P ~ P I similar to the crossed RuritanidCRT maps: a direct 2-D DFT with
no “rotations.” This occurs if and only if a?pz m o d p l = 1 and
a2pl mod p2 = 1; equivalently, if and only if N; p z mod p 1 and
N2 = - p1 mod p z , or that p l and p z are quadratic residues of each
matrix of twiddle factors other. For example, p l = 5, pa = 4, NI= 8 , N2 = 15, gives a map
that is neither Ruritanian nor CRT, and when used for both input
and output maps gives a factorization without “rotations.” There are
many factorizations for which this can be made to work; Lun and
2?r
A, = diag(e-’T’;j = 0,1,2,..., 4 - 1). (4) Siu [7] list some.

Fa’]is the length p “rotated” DFT formed by taking the rth power of IlI. M-DIMENSIONAL
FORMULATION
each clement of F p . The starred entries are those than cannot usefully While a multifactor PFA can be implemented recursively with two-
be reduced to 2-D computations. From Table II we see the historically factor maps, with some restrictions the factorizations extend to more
important result that combining Ruritanian and CRT maps leads to than two factors. The PFA with M factors involves index mappings
trne 2-D DFT’s (Fp2@ Fpl) that require no twiddle factors (as in n ++ (nl,...,R M ) . Consider first the M = 3 case. Composing two
Temperton (1985)). That there are other maps that achieve this result 2-D maps for factors p l , p z , p 3 , we obtain
is seen below.
The double use of the Ruritanian or CRT maps leads to true
2-D transformations that require no twiddle factors, but in this
case the underlying DFT operator is “rotated.” (It is impossible
for p i = p i = 1 unless either of p l or p z is unity, uninter- Provided that the outer map is of case A2, A3 or B2, NZ is a
esting cases.) Temperton (1992) claims that if an algorithm exists multiple of p l so (6) simplifies to
for computing the DFT with Fp for any p , prime or composite,
possibly using the FFT, the Corresponding matrix-vector product with +
n = ( N ~ n l NZMlnz + NzM2n3) mod N . (7)
Fa‘’ may be computed at the same cost as for Fp by modifying
the algorithm in two ways: i) Change the multiplier constants in We may redefine Nz and obtain
each submodule, and ii) raise all twiddle factors (if any) to the
rth power. However, this argument ignores the fact that many of n = (Nini +Nznz t N ~ n 3m
) odN;O<n3<pJ-1,j = 1,2,3.

the “multiplier constants” in DFT modules are used implicitly or (8)


By induction, provided that all but the inner map are of case A2,
are modified. Therefore, a nontnvial coding effort is required to
A3 or B2, the composition of M maps leads to a linear form modulo
convert an Fp module to Fag].However, there are no more than
N , as follows:
p - 1 modules Fag1 of interest, so the coding effort is modest
provided that p is small. Each can be implemented as efficiently
as Fp, and if necessary a test and branch can select the appropriate
+
n = ( N ~ n l N2nz + ... + Nn/rnn/r)mod N (9)
submodule. Thus, with nearly zero additional computational cost but where n = 0, 1,...)N - 1,n, = 0, 1,...,p 3 - 1,and N = p l p 2 ...p M .
some additional coding and maintenance cost over Fp,FF1can be It is straightforward to show that there is only one other case that leads
implemented. to (9), which is when all maps are either trivial or digit-reversing. By
The Ruritanian and CRT maps are not unique in eliminating twiddle induction, composition of any M case A3 maps leads to (9) where
factors, as pointed out in Lun and Siu (1993). We approach this also N, = a, NIP,. The constraint that the mapping be one-to-one
question in a more general framework. A mapping that eliminates requires that gcd(a,,p,) = 1.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 3, MARCH 1996 719

Applying maps of the type (9) to input and output, we obtain without requiring indirect addressing or special hardware. For M >
PM-1 P 2 - 1 Pl-1
2, there are no case A3 maps that have inverses representable as
linear forms modulo N , so recursive algorithms are necessary to
obtain this benefit.

IV. CONCLUSION
(10)
For any factorization, there are sets of corresponding input and
It is straightforward but somewhat tedious to show that the twiddle output linear modulo N maps that decompose a 1-D DFT of length
factors disappear if and only if N , I<-[mod N = 0 for all distinct j N to a multidimensional DFT, possibly modified. It is the class
and 1. This holds if and only if all maps are of case A3 (“if’ is trivial, A3 maps that avoid twiddle factors, and for every A3 input map
“only if’ follows by contradiction). When all maps are of case A3, there is exactly one corresponding output map (also A3) that does
and there are therefore no twiddle factors, we obtain not require modifying the subblocks. For general purpose processors,
the in-placehn-order property may be less important than the speed
with which explicit permutations may be performed. The author
recommends two-cycle recursive implementations that require only
swaps, and hence are trivially in-place but not in-order. Subblock
modifications are required, however.
Rigorous mathematical proofs are available for all claims in this
or paper. Interested parties may request copies from the author.

\
REFERENCES
which shows that the underlying subblocks Fp3 are modified by [l] C. S. Burrus, “Index mappings for multidimensional formulation of the
taking the corresponding powers a , b , N / p , of each element. The DFT convolution,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
subblocks are “rotated” unless a, b, m o d p , = 1 for each j. ASSP-25, pp. 239-242, 1977.
Suppose the input map (i.e., the set of coefficients a,) is given. [2] C. S. Burms and P. W. Eschenbacher, “An in-place, in-order prime
factor FFT algorithm,” LEEE Trans. Acoust., Speech, Signal, Processing,
Elementary number theory tells us that these equations have no vol. 20, , pp. 806-817, 1981.
solution unless g c d ( p , , N l p , ) = 1 , i.e., that the factors p, are [3] C. S. Burrus and T. W. Parks, DFT/FFT and Convolutional Algorithms:
relatively prime. If the factors are relatively prime, then the solutions Theory and Implementation. New York Wiley, 1985.
are simply b, = (F)
m o d p , . Since without loss of generality
[4] S. C. Chan and K. L. Ho, “On indexing the prime factor fast Fourier
transform algorithm,” IEEE Trans. Circuits Syst., vol. 38, pp. 951-953,
1 5 IC, 5 N - 1 and h‘, = 6, N l p , , also without loss of generality 1991.
we may assume that 1 5 b, 5 p , - 1 . Thus, we have obtained the [5] S . L. Johnsson et al., “Communication efficient multi-processor FFT,J.
Computat. Physics,” vol. 102, pp. 381-397, 1991.
result that for any case A3 input map, there is exactly one linear [6] E. N. Linzer and E. Feig, “Implementation of efficient fft algorithms on
output map that converts the original DFT to a multidimensional fused multiply-add architectures,” IEEE Trans. Signal Processing, vol.
DFT with no twiddle factors and no modifications of the basic 41, pp. 93-107, 1993.
subblocks. Using the same A3 map for input and output leads to the [7] D. P .K. Lun and W. C. Siu, “An analysis for the realization of
same conditions as before for the absense of “rotations”, as follows: an in-place and in-order prime factor algorithm,” IEEE Trans. Signal
Processing, vol. 41, 1993.
( a ; N / p , ) m o d p , = 1 or N,” G N / p , m o d p , . [8] D. P. K. Lun et al., “On performance evaluation and address generation
PG1 may not be representable as a linear form modulo N . In of prime factor algorithms,” in Proc. Int. Symp. Computer Architect.
practice, this does not significantly complicate coding. However, Digital Signal Processing, pp. 352-357, 1989.
some maps having this property are especially attractive. There are [9] J. H. Rothweiller, “Implementation of the in-order prime factor trans-
form for variable sizes,” IEEE Trans. Acoust., Speech, Signal Processing,
no 3-D or higher dimensional A3 maps that have linear inverses. For vol. 30, , pp. 105-107, 1982.
2-D general linear maps, only some A3 maps have linear inverse [lo] W. C. Siu and K. L. Wong, “Fast address generation for the computation
maps. Specifically, for given p l and p 2 , the A3 map given by of prime factor algorithms,” in Proc. Int. Con$ Acoust. Speech, Signal
a1 = p;’ modpl, a2 = ( z p l ) m o d p 2 , z = l a l p z / p 1 J is the Processing, 1989.
unique A3 map that has a linear inverse; it is its own inverse. For [ 111 G. Steidl and M. Tasche, “Index transforms for multidimensional DFT’s
and convolutions,” Numerische Mathematik, vol. 56, pp. 5 13-528, 1989.
the same p l and p 2 , there are as many as p z - 2 A3 maps that [12] C. Temperton, “Implementation of a self-sorting, in-place prime factor
have A2 inverses, satisfying al = p ; l m o d p l , K1 m o d p l = 1 , FFT algorithm,” J. Computat. Physics, vol. 58, pp. 283-299, 1985.
a2 = l + rzpP: K 1 modp2, bz = a;’ m o d p , , b2 > 0 . These are
[13] -, “A generalized prime factor FFI algorithm for any N =
necessary and sufficient conditions for the parameters of the forward 2 P 3 q 5 r , ” SIAM J. Sci. Statist. Comput., vol. 13, 676-686, 1992.
( N I = a l p z , NZ = a2p1) and inverse ( N I = K 1 , N Z = b 2 p l ) maps.
The cases with A3 inverses are particularly simple and efficient to
implement, as the maps are 2-cycles. For example, p l = 2, p z = 3,
N I = 3, N Z = 4, produces the map

which is obviously its own inverse. PFA implementations that use


two-factor two-cycle mapping schemes (recursively) are especially
simple and efficient to implement. Such maps are trivially in-place

View publication stats

You might also like