东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

千万亿次可扩展可容错自由网格数值模拟系统

ISSN号：1000-1239
期刊名称：《计算机研究与发展》
时间：0
分类：TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术] TP338.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所高性能计算机研究中心,北京100190, [2]中国科学院大学,北京100049
相关基金：国家自然科学基金项目(11072241,11111140020,91130026);橡树岭国家实验室/美国国家计算科学中心主任基金项目(MAT028)

作者：黎雷生[1,2], 王朝尉[2], 马志涛[2], 霍志刚[2], 田荣[2]

关键词：千万亿次计算, 无网格/粒子模拟, 高可扩展, 高可容错, 多线程, 动态负载平衡, petascale computing, meshless/particle simulation, high scalable, fault tolerance, MPI＋Pthreads, dynamic load balancing

中文摘要：

在千万亿次计算能力的驱动下，数值软件的发展进入了一个以海量并行为基本特征的历史转折期，可扩展和可容错成为大规模数值模拟的两大关键技术．petaPar模拟程序是以对传统数值技术形成优势互补的无网格类方法为切入点，面向千万亿次级计算而开发的下一代新兴通用数值模拟程序． petaPar在统一架构下实现了光滑粒子动力学（smoothed particle hydrodynamics ，SPH ）和物质点法（material point method ，M PM ）两种最为成熟和有效的无网格/粒子算法，支持多种强度、失效模型和状态方程；其中M PM 支持改进的接触算法，可以处理上百万离散物体的非连续变形和相互作用计算．系统具有以下特点：1）高可扩展．实现单核单Patch极端情形下计算和通信的完全重叠，支持动态负载均衡；2）可容错．支持无人值守变进程重启动，在系统硬件出现局部热故障时可以不中止计算；3）适应硬件体系结构异构架构的变化趋势，同时支持flat M PI和M PI＋Pthreads并行模型．程序在Titan千万亿次超级计算机上进行了全系统规模的可扩展性测试，结果表明该代码可线性扩展到26万个CPU核，SPH和M PM的并行效率分别为100％和96％．

英文摘要：

With the emergence of petaflops （1015 FLOPS） systems ,numerical simulation has entered a new era—a times opening a possibility of using 104 to 106 processor cores in one single run of parallel computing .In order to take full advantages of the powerfulness of the petaflops and post‐petaflops supercomputing infrastructures ,two aspects of grand challenges including the scalability and the fault tolerance must be addressed in a domain application .petaPar is a highly scalable and fault tolerant meshfree/particle simulation code dedicated to petascale computing .Two popular particle methods , smoothed particle hydrodynamics （SPH ） and material point method （M PM ） ,are implemented in a unified object‐oriented framework .The parallelization of both SPH and MPM consistently starts from the domain decomposition of a regular background grid .The scalability of the code is assured by fully overlapping the inter‐MPI process communication with computation and a dynamic load balance strategy .petaPar supports both flat M PI and M PI＋ Pthreads hierarchial parallelization .Application‐specific lightweight checkpointing is used in petaPar to deal with the issue of fault tolerance .petaPar is designed to be able to automatically self‐restart from any number of M PI processes ,allow a dynamic change of computing resources arisen in a scenario of , for example , nodal failure and connection timeout etc .Experiments are performed on the Titan petaflops supercomputer .It is shown that petaPar linearly scales up to 2 .6 ＆#215; 105 CPU cores with the excellent parallel efficiency of 100% and 96% for the multithreaded SPH and the multithreaded MPM ,respectively ,and the performance of the multithreaded SPH is improved by up to 30% compared with the flat MPI implementation .

同期刊论文项目