东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

中国CAE软件发展的新契机

ISSN号：1006-0871
期刊名称：计算机辅助工程
时间：2011
页码：141-143+147
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所,北京100190, [2]中国科学院软件研究所,北京100190, [3]中国工程物理研究院高性能数值模拟软件中心,北京100088
相关基金：本课题得到国家自然科学基金（11472274,11072241,11111140020,91130026）、美国橡树岭国家实验室/美国国家计算科学中心“主任基金”（MAT028,CSC153）资助.
相关项目：面向千万亿次计算的粒子/自由网格高效可扩展算法研究

作者：田荣|

关键词： CPU—GPU耦合计算, 热点加速, 全GPU加速, 对等协同, 粒子模拟, 光滑粒子流体动力学, petaPar, acceleration, peer2peer

中文摘要：

目前，光滑粒子流体动力学方法的GPu加速几乎都是基于简化的Euler控制方程，完整的Navier-stokes方程的GPU实现非常少，且对其困难、优化策略、加速效果的描述较为模糊．另一方面，CPU—GPU协同方式深刻影响着异构平台的整体效率，GPU加速模型还有待进一步探讨．文中的目的是将自主开发的、基于Navier—Stokes方程的SPH应用程序petaPar在异构平台上进行高效加速．文中首先从数学公式的角度分析了Euler方程和Navier—Stokes方程的计算特征，并总结了Navier—Stokes方程在GPU加速中面临的困难．由于Euler方程只含有简单的标量和向量计算，是典型的适合GPU的计算密集轻量级kernel；而完整形式的Navie〉Stokes方程涉及复杂的材料本构和大量张量计算，需要面对GPU上大kernel带来的系列问题，如访存压力、cache不足、低占用率、寄存器溢出等．文中通过减少粒子属性、提取操作到粒子更新、利用粒子的重用度、最大化GPU占用率等策略对Navier-Stokes方程的粒子交互kernel进行优化，具体实现见5．1节．同时，文中调研了三种GPU加速模型：热点加速、全GPU加速以及对等协同，分析了其开发投入、应用范围、理论加速比等，并深入探讨了对等协同模型的通信优化策略．由于通信粒子的不连续分布，GPU端通信粒子的抽取、插入、删除等操作本质上是对不连续内存的并行操作，会严重影响CPU—GPU的同步效果，而相关文献对此问题没有阐述．我们通过改进粒子索引规则解决此问题：粒子排序时不仅考虑网格编号，还要考虑网格类型，具体实现见5．2．3节．基于Euler方程和Navier—Stokes方程实现并分析了三种GPU加速模型．测试结果显示，三种模型下，Euler方程分别获得了8倍、33倍、36倍的加速，Navier-Stokes方程分别获得了6倍、15倍、20倍的加速．全GPU加速均突破了热点加速的?

英文摘要：

The existing GPU-accelerated codes of Smoothed Particle Hydrodynamics method mostly focus on the simplified Euler equations rather than the complete Navier-Stokes equations. Besides, the current GPU acceleration models seem to be not ＂optimal＂. There is a question that needs to be answered： what is the most emcL~ application code, especially for the Navier-Stokes equations. In this paper, we analysed the computing features of Euler equations and Navier-Stokes equations mathematicallY, and summed up the difficulties in GPU accelerating of Navier-Stokes kernels. The Euler kernel is light-weight since only simple scalar and vector calculations are involved. However, the Navier-Stokes equations involve complicated constitutive models and tensor computations, resulting in the big kernel issues on GPU, such as heavy memory access, low occupancy and register spilling, etc. Kernel optimization strategies of reducing particle properties, extracting operations from interaction kernel to updating kernel, utilizing particle reusability, and maximizing the GPU occupancy are introduced, as described in chapter 5.1. Meanwhile, we investigated three GPU acceleration models- hot-spot acceleration （t~ run hotspots on GPU）, GPU-entire （to finish the whole computing process on GPU）, and peer2peer acceleration （to treat CPU and GPU as equivalent processors）. The three models are analyzed from the perspectives of development cost, application scope, and theoretical speedup. And the communication optimization strategies of peer2peer model are addressed in detail. Because of the discontinuous distribution of communication particles, the extracting, inserting and deleting of them on GPU are actually parallel operations over discon- tinuous memory, which have serious influence on the CPU-GPU synchronization but no relevant research in literature. We solved the problem by improving the particle indexing rule, to consider not only cell index but also ceil type when ordering particles, as described in chapter

同期刊论文项目