东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

星系分组算法的并行设计与优化：SGI系统与分布式集群对比

ISSN号：1002-137X
期刊名称：《计算机科学》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]上海交通大学高性能计算中心,上海200240, [2]NVIDIA,新加坡138522
相关基金：国家重点研发计划（2016YFB0201400,2016YFB0201800）; 日本学术振兴会JSPS的RONPAKU项目; 上海交通大学SMC-晨星青年学者奖励计划资助

作者：司雨濛[1], 韦建文[1], Simon SEE[1,2], 林新华[1]

关键词：高性能计算, 星系分组, 并行计算, UPC, OPENMP, High performance computing,Galaxy group f inding,Parallel comput ing,UPC,OpenMP

中文摘要：

Halo-based Galaxy Group Finder（HGGF）是一种有效的星系分组算法,它根据星系的空间位置、红移、质量等多种属性将星系分组,从而为星系组的形成与演化研究提供重要依据。但是,算法当前的OpenMP实现版本仅能利用单节点提供的资源,在大规模星系分组问题上的应用受到限制。一种优化思路是采用多机并行,使其可以利用更多资源来解决更大规模的星系分组问题,并缩短执行时间。因此,有必要对算法重新进行设计与实现。实现此目标的一大挑战是程序中存在大量半随机性远端内存访问,其在多机并行环境下会对性能造成重大影响。为克服这一难题,设计中提出了邻接星系链表思想,并采用Unified Parallel C（UPC）进行程序实现。对于核代码部分,使用4,8,16节点时,可分别取得2.25,2.78,5.07倍的加速比;同时,对单个节点的内存需求也显著减少。OpenMP版本在SGI UV2000上的实验结果显示,受限于程序的访存特性与机器体系架构的特点,类似HGGF算法这种具有随机数据访问特征的程序,很难有效利用NUMA结构的共享内存系统中提供的大规模线程与内存资源来直接取得高加速比。在分布式内存集群上采用两级并行设计,以更好地利用局部性原理,可能是更好的解决方案。

英文摘要：

Halo-based galaxy group finder（HGGF）is an effective algorithm that accomplishes the task of galaxy group finding based on galaxy coordinates,redshift and mass etc.,and provides great help in the research of galaxy group formation and evolution.However,current pure OpenMP implementation of the algorithm is limited by the resource of the underlying single compute node when dealing with large-scale group finding problems.One of the possible solutions is using resources from multiple nodes to reduce execution time while solving large-size galaxy group finding problem.Therefore,it is essential to redesign and implement the algorithm.The major hurdle for such an attempt is remoting memory access due to semi-random galaxy access in the algorithm which damages the performance in multi-node environment.To tackle such a problem,we paralleled the algorithm with adjacent galaxy list design and used unified parallel C（UPC）to implement it.2.25,2.78 and 5.07 times speedup for the kernel were achieved with 4,8and 16 nodes respectively.Meanwhile,the memory requirement on each node was also reduced significantly.Experiments of OpenMP version of the algorithm on SGI UV 2000 show that due to the nature of the program and the features of NUMA architecture,programs with random memory access behavior like HGGF may not readily benefit from the large number of threads and shared memory provided by such machines.Two-level parallel design that takes advantage of locality principle on distributed memory clusters may be a better solution.

同期刊论文项目

　国家高性能计算环境服务化机制与支撑体系研究

期刊论文 2

　工业产品创新优化设计服务社区开发与应用

期刊论文 4

同项目期刊论文

神威太湖之光上OpenFOAM的移植与优化

一种面向循环优化和非规则代码段的粗粒度半自动并行化方法

神威太湖之光上OpenFOAM的移植与优化

有限元结构分析的层级负载均衡并行计算方法

期刊信息

《计算机科学》
北大核心期刊（2011版）

主管单位:重庆西南信息有限公司（原科技部西南信息中心）
主办单位:重庆西南信息有限公司（原科技部西南信息中心）
主编：陈国良
地址：重庆市渝北区洪湖西路18号
邮编：401121
邮箱：jsjkx12@163.com
电话：023-63500828

国际标准刊号：ISSN：1002-137X
国内统一刊号：ISSN：50-1075/TP
邮发代号:78-68

获奖情况:
2001年重庆市优秀期刊,2004年第三届重庆市优秀科技期刊,2005年重庆市优秀期刊编辑部,2010年第六届重庆市期刊综合质量考核"十佳科技期刊",2012年重庆市出版专项资金报刊资助项目（重庆市新...,2013年重庆市出版专项资金重点学术期刊资助项目（...,2014年重庆市出版专项资金期刊资助项目（重庆市文...,2015年"中国国际影响力优秀学术期刊"

国内外数据库收录:
波兰哥白尼索引,美国乌利希期刊指南,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:41227