东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

光滑粒子流体动力学方法的高效异构加速

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所,北京100190, [2]中国科学院软件研究所,北京100190, [3]中国工程物理研究院高性能数值模拟软件中心,北京100088
相关基金：本课题得到国家自然科学基金（11472274,11072241,11111140020,91130026）、美国橡树岭国家实验室/美国国家计算科学中心“主任基金”（MAT028,CSC153）资助.

关键词： CPU—GPU耦合计算, 热点加速, 全GPU加速, 对等协同, 粒子模拟, 光滑粒子流体动力学, petaPar, acceleration, peer2peer

中文摘要：

目前，光滑粒子流体动力学方法的GPu加速几乎都是基于简化的Euler控制方程，完整的Navier-stokes方程的GPU实现非常少，且对其困难、优化策略、加速效果的描述较为模糊．另一方面，CPU—GPU协同方式深刻影响着异构平台的整体效率，GPU加速模型还有待进一步探讨．文中的目的是将自主开发的、基于Navier—Stokes方程的SPH应用程序petaPar在异构平台上进行高效加速．文中首先从数学公式的角度分析了Euler方程和Navier—Stokes方程的计算特征，并总结了Navier—Stokes方程在GPU加速中面临的困难．由于Euler方程只含有简单的标量和向量计算，是典型的适合GPU的计算密集轻量级kernel；而完整形式的Navie〉Stokes方程涉及复杂的材料本构和大量张量计算，需要面对GPU上大kernel带来的系列问题，如访存压力、cache不足、低占用率、寄存器溢出等．文中通过减少粒子属性、提取操作到粒子更新、利用粒子的重用度、最大化GPU占用率等策略对Navier-Stokes方程的粒子交互kernel进行优化，具体实现见5．1节．同时，文中调研了三种GPU加速模型：热点加速、全GPU加速以及对等协同，分析了其开发投入、应用范围、理论加速比等，并深入探讨了对等协同模型的通信优化策略．由于通信粒子的不连续分布，GPU端通信粒子的抽取、插入、删除等操作本质上是对不连续内存的并行操作，会严重影响CPU—GPU的同步效果，而相关文献对此问题没有阐述．我们通过改进粒子索引规则解决此问题：粒子排序时不仅考虑网格编号，还要考虑网格类型，具体实现见5．2．3节．基于Euler方程和Navier—Stokes方程实现并分析了三种GPU加速模型．测试结果显示，三种模型下，Euler方程分别获得了8倍、33倍、36倍的加速，Navier-Stokes方程分别获得了6倍、15倍、20倍的加速．全GPU加速均突破了热点加速的?

英文摘要：

The existing GPU-accelerated codes of Smoothed Particle Hydrodynamics method mostly focus on the simplified Euler equations rather than the complete Navier-Stokes equations. Besides, the current GPU acceleration models seem to be not ＂optimal＂. There is a question that needs to be answered： what is the most emcL~ application code, especially for the Navier-Stokes equations. In this paper, we analysed the computing features of Euler equations and Navier-Stokes equations mathematicallY, and summed up the difficulties in GPU accelerating of Navier-Stokes kernels. The Euler kernel is light-weight since only simple scalar and vector calculations are involved. However, the Navier-Stokes equations involve complicated constitutive models and tensor computations, resulting in the big kernel issues on GPU, such as heavy memory access, low occupancy and register spilling, etc. Kernel optimization strategies of reducing particle properties, extracting operations from interaction kernel to updating kernel, utilizing particle reusability, and maximizing the GPU occupancy are introduced, as described in chapter 5.1. Meanwhile, we investigated three GPU acceleration models- hot-spot acceleration （t~ run hotspots on GPU）, GPU-entire （to finish the whole computing process on GPU）, and peer2peer acceleration （to treat CPU and GPU as equivalent processors）. The three models are analyzed from the perspectives of development cost, application scope, and theoretical speedup. And the communication optimization strategies of peer2peer model are addressed in detail. Because of the discontinuous distribution of communication particles, the extracting, inserting and deleting of them on GPU are actually parallel operations over discon- tinuous memory, which have serious influence on the CPU-GPU synchronization but no relevant research in literature. We solved the problem by improving the particle indexing rule, to consider not only cell index but also ceil type when ordering particles, as described in chapter

同期刊论文项目

面向千万亿次计算的粒子/自由网格高效可扩展算法研究

期刊论文 6

面向众核计算的数值方法协同设计--一种高效且高精度广义有限元方法研究

期刊论文 1

面向千万亿次计算的可扩展新型有限元算法及大规模并行程序

期刊论文 27 会议论文 6

同项目期刊论文

petaPar: <span style=&quo

基于GPU的高性能稀疏矩阵向量乘及CG求解器优化

高可扩展可容错的无网格/粒子程序petaPar及其测试

百亿亿级计算机遇与挑战

基于GPU的高性能稀疏矩阵向量乘

Scalable FEA on non-conforming assembly mesh

Approximate acoustic cloaking in inhomogeneous isotropic space

Improved XFEM—An extra-dof free, well-con

New error estimates of the Morley element for the plate bending problems

New error estimates of nonconforming mixed finite element methods for the Stokes problem

Numerical Analys

关于我国百亿亿级计算发展的思考

<span style="font-family:仿宋_GB231

超级计算面临的挑战及其对未来数值算法设计的可能影响</

Stable multilevel splittings of boundary edge element spaces

Anisotropic Interpolation Error estimates via Othogonal Expansions

CONVERGENCE AND SUPERCONVERGENCE ANALYSIS OF LAGRANGE RECTANGULAR ELEMENTS WITH ANY ORDER ON ARBITRARY RECTANGULAR MESHES

千万亿次可扩展可容错自由网格数值模拟系统

中国CAE软件发展的新契机

Scalable FEA on non-conforming assembly mesh

Simulation at extreme-scale: co-design thinking and practices

千万亿次可扩展可容错自由网格数值模拟系统

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433