对非2次幂长度的海量数据FFT处理器设计,采用补零技术会造成巨大硬件资源的浪费,且影响算法性能.提出了一种适合于硬件实现,可处理数据长度为q×2“的FFT算法(q为非2质数)以及基于此算法的FFT处理器设计方法.提出的操作数地址映射方法充分利用了算法的同址特性,使得在最少的存储空间需求下,达到最大的数据并行性;设计的混合运算单元有效地统一了混合基和q点DFT运算,减少了运算部件的资源占用率,使得多个运算单元的并行成为可能.仿真结果表明,计算16位20480点DFT运算需要7181个时钟周期,系统频率达到了105MHz.不仅有效地扩展了FFT处理器的数据处理范围,同时满足SAR等实时系统对处理速度的要求.
For the design of FFT processor of massive data with length N≠2m, zero-padding technique always results in massive wastes of hardware resource, and reduces the processor performance significantly. Presented in this paper is a new FFT algorithm for length N=q×2m, where q is a prime integer except for 2. Under the condition of a wider range of choices on different sequence lengths, comparisons with previously reported algorithms show that this algorithm not only makes substantial savings on arithmetic, but also has more regular computational structure which is of great benefit to hardware implementation of the algorithm. Based on this algorithm, a design method of parallel architecture FFT processor is also presented. The dedicated parallel memory mapping algorithm with the feature of minimal memory size relies on the in-place calculation property of the PCW FFT algorithm, and can simultaneously access to q× 4 or 4×q data. A hybrid arithmetic unit unifies the arithmetic structures of mixed radix FFT and WFTA of q-point. Compared with separate arithmetic units, this unit can save hardware resource substantially, which makes more arithmetic units in one processor possible. The hardware simulation of the FFT processor shows that a complex 20480-point DFT operation can be done within 7181 clock cycles under the max clock frequency 105 MHz.