东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

GPU机群系统上加速EMAN（英文）

ISSN号：1000-6737
期刊名称：《生物物理学报》
时间：0
分类：TP302[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术] TP334.7[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所高性能计算机研究中心,北京100190, [2]计算机体系结构国家重点实验室中国科学院计算技术研究所,北京100190, [3]中国科学院大学,北京100049
相关基金：国家“九七三”重点基础研究发展计划基金项目（2012CB316502）;国家“八六三”高技术研究发展计划基金项目（2009AA01A129）;中国科学院知识创新工程重大项目（KGCX1-Yw-13）;国家自然科学基金项目（60803030,60633040,60925009,60921002）

关键词： Hash索引, 生物信息学, 高通量测序, FPGA, 并行加速器, Hash-index bioinformatics, high-throughput sequencing FPGA parallel accelerator

中文摘要：

近年来随着高通量基因测序技术的迅速发展，测序成本和周期都得到了大幅降低．然而，新一代测序技术海量数据生成能力以及各类测序算法蕴含的高并发性却对现有计算机的运算能力提出了新挑战．以一个基于Hash索引算法实现的开源重测序程序（PerM）为例，研究了在商用多核CPU上加速该应用程序的关键技术．在一个64核SMP系统上的实验结果证明，提出的优化技术可以使Cache缺失率降低90％，性能提升4～11倍．接下来探讨了在一个包含XilinxLX330FPGA的加速卡上设计实现专用并行加速系统的相关问题．作为原型验证系统，在基于FPGA的PCIe加速卡上设计并实现了包含11个处理单元的脉动陈列并行计算系统．和IntelXeonX75508核CPU相比，提出的并行加速器有30～65倍性能功耗比优势．

英文摘要：

In recent years, due to the rapid development of high-throughput next generation sequencing （NGS） technologies, the sequencing cost and time have been greatly reduced. However, both the explosion of the generated NGS data and the massively parallel computation pose great challenges to the capability of existing computers. We take an open-source re-sequencing algorithm based on hash-index, called PerM, as an example to investigate the optimizations for accelerating NGS with commercial multi-core CPUs as well as with customized parallel architectures. Firstly, we optimize the original algorithm by reordering the bucket accessing sequences so that data locality in shared cache is improved. Secondly, to exclude the empty hash buckets, we propose a hash-index compression algorithm, which coincides with the sequential access nature of the optimized algorithm. The experiments on a 64-cores SMP （Intel Xeon X7550） show that the optimized algorithm reduces LLC miss ratio to about 10% of the original algorithm, therefore the overall performance can be improved by 4 to 11 times. Furthermore, a parallel accelerator architecture is designed and evaluated on our customized FPGA accelerator card with a Xilinx LX330 FPGA resident. As a prototype, a systolic array of 100 PEs is built, ＂which operates at 175 MHz. The performance of the proposed parallel accelerator architecture is justified by the reported speedup of 30 to 65 times over an 8-cores CPU.

同期刊论文项目