FFT(Fast Fourier transform,快速傅立叶变换)是工程应用中的一个基本算法,优化其性能对于推广龙芯系列处理器的应用具有重要意义.本文充分挖掘龙芯3A处理器的硬件特性,对运算量和调整位序的过程作了优化并使用128位访存来减少访存指令的比例,从而实现了高效的FFT算法.实验结果表明,在825M龙芯3A处理器上经过优化后的一维FFT的速度是FF-TW库的2.5倍左右,而二维FFT的速度则是FFTW的3倍左右.
To promote the application of Loongson processers,it is of enormous significance to optimize the performance of FFT(Fast Fourier transform),which is a basic tool in many engineering fields.In this paper,the hardware characteristics of loongson 3A processer are fully exploited based on some programming techniques,such as improving the computation and the bit reverse process and utilizing the Loongson 3A′s 128 bit memory access instructions to reduce the ratio of the memory instructions,and finally efficient FFT algorithms are implemented.The experiments show that the proposed 1d-fft and 2d-ff algorithms on 825MHz loongson 3A processor are about 2.5 and 3 times as fast as FFTW respectively.