数值模拟是行星流体动力学研究的主要工具.本文介绍CPU-MIC异构众核平台的行星流体动力学数值模拟,计算并模拟地球外核的磁流体运动.本文在已有工作的基础上,添加了CPU-MIC异构众核环境的数值模拟支持.首先描述了CPU-MIC异构众核环境的上的数值模拟流程,然后给出了MIC上的分布式并行GMRES(m)众核解法器的实现算法.其次,实现了解法器的计算核心稀疏矩阵向量乘(SpMV)在MIC上的分布式并行算法,该SpMV实现了计算一通信重叠、数据传输一计算重叠.再次,为加速行星流体动力学方程收敛,给出了MIC上以SpMV为基本操作的分布式并行多项式预条件子.最后,提出了一些MIC众核平台的优化措施,如多线程、流存储和数据传输优化等.天河2号数值模拟表明相比CPU版的数值模拟,CPU-MIC异构众核环境下数值模拟在单MIC卡和64块MIC卡分别取得了6.93和6.0倍的加速比.
Massively parallel computing is becoming a primary tool for the numerical simulation of planetary fluid dynamics. In this paper, Numerical simulation of the planetary fluid dynamics for distributed memory Xcon Phi-accderated systems is studied. Firstly, we start from a legacy parallel code [1-3] using PETSc software package, which employs a pure MPI approach for parallel computing, to date, is in lack of support for multi-threaded parallelism on many- core accelerated systems, and then we extend the legacy code to multi-threaded parallelism on Xeon Ph~-accderated systems. Furthermore, based on PETSc software package, a sparse linear solver for Xeon Phi-accelerated cluster, which utilizes restarted generalized residual method(GMRES(m)), is presented and optimized. Secondly, a novel sparse matrix-vector multiplication(SpMV) algorithm for Xeon Phi-accelerated cluster is proposed, it combines highly aggressive use of asynchrony with offload, compute, communication, M1 of which serve the overlap of computation and communication. What's more, based on our SpMV algorithm, a polynomial preconditioner is given, which mainly consists of SpMV operations, hide and reduce communication, whether to local memory, across the network, or over PCIe. Finally, some optimized measures are taken to the extended code. Experiments on Tianhe- 2 Supercomputer show that as compared to the original code, our Xeon Phi-accelerated design is able to deliver 6.93x and 6.00x speedups for single MIC device and 64 MIC devices, respectively.