Index Mappings For The Fast Fourier Transform: IEEE Transactions On Signal Processing April 1996
Index Mappings For The Fast Fourier Transform: IEEE Transactions On Signal Processing April 1996
Index Mappings For The Fast Fourier Transform: IEEE Transactions On Signal Processing April 1996
net/publication/3316018
CITATIONS READS
12 984
1 author:
James Schatzman
N-ask Inc
7 PUBLICATIONS 128 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by James Schatzman on 09 December 2015.
polynomial," Trans. Inst. Elec. Commun. Eng. (in Japanese), vol. J68-A, TABLE I
pp. 173-180, 1985. SUMMATTON KERNEL FOR COMBINATIONS OF POPULAR MAPS.HERE
Z. Wang, "Pruning the fast discrete cosine transform," IEEE Trans. , P: = ( P ; ~ mod PZ) AND PL = (P;' mod PI)
Commun., vol. 39, pp. 640-643, May 1991.
S. B. Narayanan and K. M. M. Prabhu, "Fast Hartley transform pruning," Output Map
IEEE Trans. Signal Processing, vol. 39, pp. 230-233, Jan. 1991. Input
Abstract- General analysis shows which linear index maps for the
multidimensional PFA avoid twiddle factors. For any input map of
this class, there is a unique output map which omits twiddle factors
and modifications of the underlying DFT subblocks. With subblock
modifications, all factorizations can be done in-place and in-order.
I. INTRODUCTION
The prime factor algorithm fast Fourier transform is based on the
reduction of a 1-D discrete fourier tranform to a multidimensional
DFT. Many have studied the underlying index mapping schemes:
where N = p1p2 for arbitrary natural numbers p l and p 2 . For the
Burrus (1977), Burrus and Eschenbacher (1981), Rothweiller (1982),
Bums and Parks (1985), Temperton (1985), Siu and Wong (1989), n * ( n 1 , n ~ map
) to be one-to-one requires (Burrus (1977)) the
Lun et al. (1989), Chan and Ho (1991), Johnsson et al. (1991), following.
and Temperton (1992). Temperton (1985) and (1992) and Burrus CaseA: gcd(p1,pZ) = 1:
and Eschenbacher (1981) describe in-place in-order FFT algorithms. Al) NI = a l p z , NZ # U Z ~ I , gcd(a1,pl) = 1, and
Steidl and Tasche (1989) and Lun and Siu (1993) give in-place and gcd(N2,pz) = 1; or
in-order algorithms for special factorizations. Linzer and Feig (1993) A21 NI # a m , NZ = anpi, gcd(N1,pl) = 1, and
show that incorporating direct permutation steps into the first or last gcd(az,pz) = 1; or
stage of the FFT does not degrade performance. Burrus (1977) and A3) NI = a1p2, NZ = a m , gcd(a1,pl) = 1, and
Lun and Siu (1993) give the clearest and most useful analyses of gcd(az,pz) = 1
linear index maps. This work builds on these results to obtain a
Case B: gcd(p1,pZ) > 1:
comprehensive theory.
B l ) NI = a m , NZ # azpl, gcd(a1,pl) = 1, and
gcd(Nz,p z ) = 1
11. TWO-DIMENSIONAL
FORMULATION
B2) NI # alpz, NZ = azpl, gcd(N1,pI) = 1, and
The discrete Fourier transform of length N may be written gcd(az,pz) = 1.
where a1 and a2 are integers. Particular maps of importance are
NI = 1, NZ = pl ("Trivial map," Cases A2/B2); NI = p 2 , Nz = 1
n=O ("Digit-reversing map," Cases Al/Bl); a1 = 1, a2 = 1, ("Ruritanian
map," Case A3); a1 = pT1 mod p l , a2 = pT1 mod p ~ ("Chinese
,
and the goal is to simplify the computation by reordering the input
Remainder Theorem (CRT) map," Case A3). Mapping xn to 2nl,n2
and output and by exploiting the redundancy of the transform. For
two factors (N = p l p z ) , the input index n and the output index ik and ck to 2kl,k2, (1) becomes
are generally mapped to two dimensions according to a certain linear
form modulo N, as follows:
Manuscript received August 29, 1994; revised May 31, 1995. The associate Considering all possible combinations of input and output maps
editor coordinating the review of this work and approving it for publication from these special cases, we obtain the results in Table I.
was Dr. K. J. Ray Liu. Using matrix notation, c' = F Z becomes c' = PG'EPrZ where
J. C. Schatzman is with the University of Wyoming, Laramie, WY 82071
USA (e-mail:[email protected]). PI and PO are the input/output map permutation matrices and E is
Publisher Item Identifier S 1053-587X(96)02384-5. the matrix from the Table II. In Table 11, T is the N by N diagonal
TABLE I1 twiddle factors must belong to case A3 (easily shown). If the n and
MAPPEDDISCRETE
FOURIER TRANSFORM WRI’MENIN k maps both belong to Case A3, with corresponding integers a1 , uz
MATRIXNOTATIONFOR COMBINATIONS OF POPULAR ) b l , bz for k * ( k l , k z ) ,we obtain
for n tf ( n l , n ~and
MAPS.@ DENOTES
THE KRONECKER MATRIXmtooucr
Output Map
Output
nz=O n1=0
TrWlal
Fa’]is the length p “rotated” DFT formed by taking the rth power of IlI. M-DIMENSIONAL
FORMULATION
each clement of F p . The starred entries are those than cannot usefully While a multifactor PFA can be implemented recursively with two-
be reduced to 2-D computations. From Table II we see the historically factor maps, with some restrictions the factorizations extend to more
important result that combining Ruritanian and CRT maps leads to than two factors. The PFA with M factors involves index mappings
trne 2-D DFT’s (Fp2@ Fpl) that require no twiddle factors (as in n ++ (nl,...,R M ) . Consider first the M = 3 case. Composing two
Temperton (1985)). That there are other maps that achieve this result 2-D maps for factors p l , p z , p 3 , we obtain
is seen below.
The double use of the Ruritanian or CRT maps leads to true
2-D transformations that require no twiddle factors, but in this
case the underlying DFT operator is “rotated.” (It is impossible
for p i = p i = 1 unless either of p l or p z is unity, uninter- Provided that the outer map is of case A2, A3 or B2, NZ is a
esting cases.) Temperton (1992) claims that if an algorithm exists multiple of p l so (6) simplifies to
for computing the DFT with Fp for any p , prime or composite,
possibly using the FFT, the Corresponding matrix-vector product with +
n = ( N ~ n l NZMlnz + NzM2n3) mod N . (7)
Fa‘’ may be computed at the same cost as for Fp by modifying
the algorithm in two ways: i) Change the multiplier constants in We may redefine Nz and obtain
each submodule, and ii) raise all twiddle factors (if any) to the
rth power. However, this argument ignores the fact that many of n = (Nini +Nznz t N ~ n 3m
) odN;O<n3<pJ-1,j = 1,2,3.
Applying maps of the type (9) to input and output, we obtain without requiring indirect addressing or special hardware. For M >
PM-1 P 2 - 1 Pl-1
2, there are no case A3 maps that have inverses representable as
linear forms modulo N , so recursive algorithms are necessary to
obtain this benefit.
IV. CONCLUSION
(10)
For any factorization, there are sets of corresponding input and
It is straightforward but somewhat tedious to show that the twiddle output linear modulo N maps that decompose a 1-D DFT of length
factors disappear if and only if N , I<-[mod N = 0 for all distinct j N to a multidimensional DFT, possibly modified. It is the class
and 1. This holds if and only if all maps are of case A3 (“if’ is trivial, A3 maps that avoid twiddle factors, and for every A3 input map
“only if’ follows by contradiction). When all maps are of case A3, there is exactly one corresponding output map (also A3) that does
and there are therefore no twiddle factors, we obtain not require modifying the subblocks. For general purpose processors,
the in-placehn-order property may be less important than the speed
with which explicit permutations may be performed. The author
recommends two-cycle recursive implementations that require only
swaps, and hence are trivially in-place but not in-order. Subblock
modifications are required, however.
Rigorous mathematical proofs are available for all claims in this
or paper. Interested parties may request copies from the author.
\
REFERENCES
which shows that the underlying subblocks Fp3 are modified by [l] C. S. Burrus, “Index mappings for multidimensional formulation of the
taking the corresponding powers a , b , N / p , of each element. The DFT convolution,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
subblocks are “rotated” unless a, b, m o d p , = 1 for each j. ASSP-25, pp. 239-242, 1977.
Suppose the input map (i.e., the set of coefficients a,) is given. [2] C. S. Burms and P. W. Eschenbacher, “An in-place, in-order prime
factor FFT algorithm,” LEEE Trans. Acoust., Speech, Signal, Processing,
Elementary number theory tells us that these equations have no vol. 20, , pp. 806-817, 1981.
solution unless g c d ( p , , N l p , ) = 1 , i.e., that the factors p, are [3] C. S. Burrus and T. W. Parks, DFT/FFT and Convolutional Algorithms:
relatively prime. If the factors are relatively prime, then the solutions Theory and Implementation. New York Wiley, 1985.
are simply b, = (F)
m o d p , . Since without loss of generality
[4] S. C. Chan and K. L. Ho, “On indexing the prime factor fast Fourier
transform algorithm,” IEEE Trans. Circuits Syst., vol. 38, pp. 951-953,
1 5 IC, 5 N - 1 and h‘, = 6, N l p , , also without loss of generality 1991.
we may assume that 1 5 b, 5 p , - 1 . Thus, we have obtained the [5] S . L. Johnsson et al., “Communication efficient multi-processor FFT,J.
Computat. Physics,” vol. 102, pp. 381-397, 1991.
result that for any case A3 input map, there is exactly one linear [6] E. N. Linzer and E. Feig, “Implementation of efficient fft algorithms on
output map that converts the original DFT to a multidimensional fused multiply-add architectures,” IEEE Trans. Signal Processing, vol.
DFT with no twiddle factors and no modifications of the basic 41, pp. 93-107, 1993.
subblocks. Using the same A3 map for input and output leads to the [7] D. P .K. Lun and W. C. Siu, “An analysis for the realization of
same conditions as before for the absense of “rotations”, as follows: an in-place and in-order prime factor algorithm,” IEEE Trans. Signal
Processing, vol. 41, 1993.
( a ; N / p , ) m o d p , = 1 or N,” G N / p , m o d p , . [8] D. P. K. Lun et al., “On performance evaluation and address generation
PG1 may not be representable as a linear form modulo N . In of prime factor algorithms,” in Proc. Int. Symp. Computer Architect.
practice, this does not significantly complicate coding. However, Digital Signal Processing, pp. 352-357, 1989.
some maps having this property are especially attractive. There are [9] J. H. Rothweiller, “Implementation of the in-order prime factor trans-
form for variable sizes,” IEEE Trans. Acoust., Speech, Signal Processing,
no 3-D or higher dimensional A3 maps that have linear inverses. For vol. 30, , pp. 105-107, 1982.
2-D general linear maps, only some A3 maps have linear inverse [lo] W. C. Siu and K. L. Wong, “Fast address generation for the computation
maps. Specifically, for given p l and p 2 , the A3 map given by of prime factor algorithms,” in Proc. Int. Con$ Acoust. Speech, Signal
a1 = p;’ modpl, a2 = ( z p l ) m o d p 2 , z = l a l p z / p 1 J is the Processing, 1989.
unique A3 map that has a linear inverse; it is its own inverse. For [ 111 G. Steidl and M. Tasche, “Index transforms for multidimensional DFT’s
and convolutions,” Numerische Mathematik, vol. 56, pp. 5 13-528, 1989.
the same p l and p 2 , there are as many as p z - 2 A3 maps that [12] C. Temperton, “Implementation of a self-sorting, in-place prime factor
have A2 inverses, satisfying al = p ; l m o d p l , K1 m o d p l = 1 , FFT algorithm,” J. Computat. Physics, vol. 58, pp. 283-299, 1985.
a2 = l + rzpP: K 1 modp2, bz = a;’ m o d p , , b2 > 0 . These are
[13] -, “A generalized prime factor FFI algorithm for any N =
necessary and sufficient conditions for the parameters of the forward 2 P 3 q 5 r , ” SIAM J. Sci. Statist. Comput., vol. 13, 676-686, 1992.
( N I = a l p z , NZ = a2p1) and inverse ( N I = K 1 , N Z = b 2 p l ) maps.
The cases with A3 inverses are particularly simple and efficient to
implement, as the maps are 2-cycles. For example, p l = 2, p z = 3,
N I = 3, N Z = 4, produces the map