在千万亿次计算能力的驱动下,数值软件的发展进入了一个以海量并行为基本特征的历史转折期,可扩展和可容错成为大规模数值模拟的两大关键技术.petaPar模拟程序是以对传统数值技术形成优势互补的无网格类方法为切入点,面向千万亿次级计算而开发的下一代新兴通用数值模拟程序. petaPar在统一架构下实现了光滑粒子动力学(smoothed particle hydrodynamics ,SPH )和物质点法(material point method ,M PM )两种最为成熟和有效的无网格/粒子算法,支持多种强度、失效模型和状态方程;其中M PM 支持改进的接触算法,可以处理上百万离散物体的非连续变形和相互作用计算.系统具有以下特点:1)高可扩展.实现单核单Patch极端情形下计算和通信的完全重叠,支持动态负载均衡;2)可容错.支持无人值守变进程重启动,在系统硬件出现局部热故障时可以不中止计算;3)适应硬件体系结构异构架构的变化趋势,同时支持flat M PI和M PI+Pthreads并行模型.程序在Titan千万亿次超级计算机上进行了全系统规模的可扩展性测试,结果表明该代码可线性扩展到26万个CPU核,SPH和M PM的并行效率分别为100%和96%.
With the emergence of petaflops (1015 FLOPS) systems ,numerical simulation has entered a new era—a times opening a possibility of using 104 to 106 processor cores in one single run of parallel computing .In order to take full advantages of the powerfulness of the petaflops and post‐petaflops supercomputing infrastructures ,two aspects of grand challenges including the scalability and the fault tolerance must be addressed in a domain application .petaPar is a highly scalable and fault tolerant meshfree/particle simulation code dedicated to petascale computing .Two popular particle methods , smoothed particle hydrodynamics (SPH ) and material point method (M PM ) ,are implemented in a unified object‐oriented framework .The parallelization of both SPH and MPM consistently starts from the domain decomposition of a regular background grid .The scalability of the code is assured by fully overlapping the inter‐MPI process communication with computation and a dynamic load balance strategy .petaPar supports both flat M PI and M PI+ Pthreads hierarchial parallelization .Application‐specific lightweight checkpointing is used in petaPar to deal with the issue of fault tolerance .petaPar is designed to be able to automatically self‐restart from any number of M PI processes ,allow a dynamic change of computing resources arisen in a scenario of , for example , nodal failure and connection timeout etc .Experiments are performed on the Titan petaflops supercomputer .It is shown that petaPar linearly scales up to 2 .6 × 105 CPU cores with the excellent parallel efficiency of 100% and 96% for the multithreaded SPH and the multithreaded MPM ,respectively ,and the performance of the multithreaded SPH is improved by up to 30% compared with the flat MPI implementation .