近年来GPU作为一种具有极强运算能力的多核处理器,得到了快速的发展,成为高性能计算领域的主要发展方向。各种分子动力学模拟的主流软件也纷纷使用GPU技术,其中LAMMPS较早地开发出了通用的并行GPU版本。本文利用nVIDIA公司最新Femi架构的Tesla C2050GPU搭建了小型的基于LAMMPS的分子动力学模拟GPU并行计算集群,通过氩原子熔化的算例对集群性能进行了测试,测试的内容包括CPU集群、单节点单GPU、单节点多GPU以及多节点GPU集群。比较了各种情况的加速倍数并对造成性能改变的原因进行了讨论,分析了用于MD模拟的GPU并行计算集群性能的瓶颈所在,提出可能的解决方法,搭建集群时,充分考虑PCI总线的承受能力,对于集群效率的提高有很大好处。测试结果表明,集群的性能较高,相对于以往的单机以及CPU集群,计算的规模大大提高了,加速比也在20倍以上。可以预测,在未来一段时间内,多GPU并行是分子动力学模拟的发展方向。
With computing evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU, the performance of multicore GPU has dramatic increased in recent years and has been enthusiastically received in the area of scientific research. A variety of mainstream software about molecular dynamics simulation adopted GPU technology, among which lammps has developed a general-purpose parallel GPU version prior to others. This paper presents performance test of GPU parallel version of lammps by molecular dynamics (MD) simulation of argon atoms melting on a 4 node cluster with Tesla C2050 GPU supporting latest Femi architecture of nVIDIA company. The test includes CPU clusters, one node single GPU, one node multi-GPU, multi-nodes GPU clusters. By comparing the acceleration rates and discussing the possible causes of changes of performance, we analyzed the bottlenecks of parallel computing cluster of GPU for MD simulation, and posed possible solutions. Considering the PCI bus capacity when building a cluster benefits the efficiency of the cluster a lot. The test result show that the performance of cluster is fairly high compared with single CPU and CPU cluster, the computable particle scale of the MD simulation increased greatly, and the speed-up ratio is more than 20 times. We can predict that in the future, multi-GPU parallel computing is the direction of development of molecular dynamics simulation.