单颗粒重构技术是确定大分子三维结构的重要手段之一.近年来,由于其本身独有的一些优点,单颗粒重构技术受到越来越广泛的关注.然而其处理过程极其耗时,并且缺少高效的并行实现,极大地限制了该技术的应用.对当今使用最广泛的单颗粒重构软件EMAN进行了性能优化及并行加速.通过分析各部分的具体算法,发现其中最核心的问题是如何在低通信开销的前提下实现负载平衡.针对这一问题,提出了自适应动态调度算法.该算法不仅适合于EMAN,同样适合于其他类似的独立任务调度问题.实际运行结果表明,经过优化的串行程序运行时间减少11.50%.由于采用了自适应动态调度算法,提供的并行实现比EMAN自带的实现具有更高的加速比,其中最耗时的分类操作加速比接近线性.在16个处理器核上的总体并行效率比EMAN自带的并行实现高29.8%.因此,提供的并行实现可有效利用计算资源,显著缩短单颗粒重构所需时间.
Single particle reconstruction is one of the most important technologies for determining three-dimensional structures of macromolecules. In recent years, it has been given more and more attention, because of some of its distinct features. Unfortunately, its application is greatly constrained, due to its extremely long processing time and lack of efficient parallel implementations. This study optimizes and parallelizes one of the most widely-used software packages for single particle reconstruction: EMAN. By analyzing algorithms of its major components, the authors find that the key problem is achieving ideal load balancing with low communication costs. A self-adaptive dynamic scheduling algorithm is introduced to solve this problem. It is not only applicable to EMAN, but also to other similar scheduling problems with independent tasks. Actual experiments show that through optimization, serial execution time of our implementation is 11.50% less than that of EMAN. Besides, thanks to the self-adaptive scheduling algorithm, our implementation produces much higher speedups than EMAN. Speedups of the most time-consuming classification component are close to linearity. Moreover, parallel efficiency of our implementation on 16 CPU cores is 29.8% higher, compared with the implementation of EMAN. Therefore, our implementation is capable of making full use of available computing resources, dramatically reducing the processing time of single particle reconstruction.