东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

异构平台上性能自适应FFT框架

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2014.3.2
页码：1-13
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]并行软件与计算科学实验室(中国科学院软件研究所),北京100190, [2]中国科学院大学,北京100049, [3]计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190
相关基金：国家自然科学基金项目（61221062）;国家“八六三”高技术研究发展计划基金项目（2012AA010902,2012AA010903）;中国科学院研究生科技创新与社会实践资助专项基金项目（11000GBF01）
相关项目：超并行高效能计算机体系结构与设计方法研究

作者：李焱|张云泉|

关键词：快速傅里叶变换, 自适应性能优化, 加速处理器, 图形处理器, 异构, fast Fourier transform （FFT）, auto-tuning performance, accelerated processing unit （APU） , graphic processing unit （GPU） , heterogenous

中文摘要：

快速傅里叶变换（fast Fourier transform,FFT）在科学和工程界中具有着广泛的应用,尤其是在信号处理、图像处理以及求解偏微分方程领域.基于图形处理器（graphic processing unit,GPU）和加速处理器（accelerated processing unit,APU）的异构平台,提出了自适应性能优化的大规模并行FFT（massively parallel FFT,MPFFT）框架.MPFFT框架采用了安装时和运行时2层自适应策略.安装时借助代码产生器可以生成被GPU程序内核（kernel）调用的任意长度的代码模板库（codelet）；运行时根据自动调优技术使代码产生器生成高度优化的GPU计算代码.实验结果表明：MPFFT在APU平台上,一维、二维以及三维FFT相对于AMD clAmdFft 1.6取得的平均加速比分别为3.45,15.20以及4.47,在AMD HD7970 GPU上平均加速比分别为1.75,3.01和1.69.在NVIDIA Tesla C2050 GPU上取得的整体性能都达到了CUFFT 4.1的93％,最大加速比能够达到1.28.

英文摘要：

The fast Fourier transform （FFT） is an important computational kernel in scientific and engineering computation which has broad applicability, especially in the field of signal processing, image processing and solving partial differential equation. In this paper, we propose an automatic performance tuning framework, called MPFFT （massively parallel FFT）, which is well-suited to heterogeneous platforms such as GPU （graphic processing unit） and APU （accelerated processing unit）. We employ two-stage adaptation methodology in two levels, namely installation time and runtime. At installation time, there is a code generator that could automatically generate FFT codelet for arbitrary size called by GPU kernel. The code generator could also generate high optimized code for GPU kernel according to auto-tuning techniques at runtime. Experimental results demonstrate that MPFFT substantially outperforms the clAmdFft library both on AMD GPU and APU. For 1D, 2D and 3D FFT, the average speedup of MPFFT compared with clAmdFft 1.6 achieves up to 3.45, 15.20, 4.47 on AMD APU A-360 and 1.75, 3.01, 1.69 on AMD HD7970. It also achieves comparable performance as the CUFFT library on NVIDIA GPU, and the overall performance is within 93% of CUFFT 4.1 on Tesla C2050, and the maximum speedup is 1.28.

同期刊论文项目