随着高性能计算需求的日益增长,多核处理器在高性能计算中间得到了广泛的普及.为了保证高性能计算机系统的效率,需要保持计算和通信的平衡性,多核的广泛使用对通信系统的效率提出了更高的要求.集合通信作为通信系统中的重要组成部分,研究多核环境下的高效集合通信具有十分重要的意义.文中首先研究了多核对集合通信性能的影响,并根据多核处理器共享Cache以及内存竞争的特点,提出了层次化算法、限制并发、NUMA感知的优化方法和Cache友好的优化算法,并分别在MPI_Barrier、MPIBcast和MPI-Alltoall中进行了验证.实验结果表明优化方法能够有效地利用多核结构特点,降低竞争带来的影响,提高了多核环境下集合通信的性能和可扩展性.
With the rapid increase in HPC computing requirement, the multicore is widely deployed in HPC systems. To keep the efficiency of application in large scale systems, it is very im- portant keep up the balance of communication to computation, thus multicore brings more re- quirement for communication systems. As collective communication is an important part in com- munication systems and is critical for the whole systems, thus it is important to research on the impacts of multicore environment on collective performance. This paper first analyzes how mul- ticore impacts on collective communication. It is found out that multicore SMP clusters brings two conflict impacts, it not only has faster intra-socket communication path which can speed up the performance of collective communications, but also it brings memory~cache contention which might degrade the communication performance. Based on these aspects, this paper proposes mul- ticore-aware collecitves optimization techniques, which includes, hierarchy-aware algorithms, limited-concurrency, NUMA-aware algorithms and cache-friendly optimization. These optimiza- tion methods are implemented in MPI_]3arrier, MPI_]3cast and MPI_Alltoall. Experiments show that the proposed algorithms increase the performance and scalabitity of collective communication.