针对CPU-GPU异构并行系统应用开发移植后优化不充分问题,提出了一种渐近拟合优化与源到源编译相结合的方法,该方法能够对插入了制导语句的C语言程序转换为CUDA语言后的程序进行多次剖分,根据源程序特性和硬件信息自动完成源到源编译与优化,并基于该方法实现了原型系统。通过在不同环境中的该原型系统在功能和性能方面进行的测试表明,由系统生成的CUDA目标程序与C源程序在功能上一致,性能上却有了大幅度提高,通过与CUDA基准测试程序相比表明,该目标程序在性能上明显优于其他源到源编译转换生成的程序。
Aiming at addressing the problem of the inadequate performance optimization after developing and porting of application on CPU-GPU heterogeneous parallel system, a new approach for CPU-GPU system is proposed, which combines asymptotic fitting optimization with source-to-source compiling technique. This approach can translate C code that inserts directives into CUDA code, and profile the generated code several times. Besides, the approach can realize the source-to-source compiling and optimization of the generated code automatically, and a prototype system based on the approach is realized in this paper as well. Functionality and performance evaluations of the prototype show that the generated CUDA code is functionally equivalent to the original C code while its improvement in performance is significant.When compared with CUDA benchmark, the performance of the generated CUDA code is obviously better than codes generated by other source-to-source compiling technique.