BSP(Bulk Synchronous Parallel,BSP)计算模型是建立大规模迭代式图处理分布式系统的重要基础.现有平台(如Pregel、Giraph、Hama)虽然已经实现了较高的可扩展性,但主机之间高频同步和通信负荷严重影响了并行计算的效率.为了解决这个关键性问题,本文提出了一种基于混合式模型的执行平台GraphHP(Graph Hybrid Processing).它不仅继承了以顶点为中心的BSP编程接口,而且能够显著减少同步和通信负荷.通过在图分区内部和分区之间建立混合执行模型,GraphHP实现了伪超步迭代计算,把分区内部计算从分布式同步和通信中分离出来.这种混合执行模型不需要繁重的调度算法或者以图为中心的串行算法,就能有效减少同步和通信负荷.最后,本文评估了经典的BSP应用在GraphHP平台的实现方式.实验表明它比现有的BSP实现平台效率更高.本文提出的GraphHP平台虽然是基于Hama实现的,但它很容易迁移到其他的BSP平台.
BSP (Bulk Synchronous Parallel) computing model is an important foundation for the establishment of a large-scale iterative graph processing distributed system. Existing platforms (e.g., Pregel, Giraph, and Hama) have achieved a high scalability, but the high frequency synchronization and communication load between the hosts have seriously affected the efficiency of parallel computing. In order to solve this key problem, this paper proposes a hybrid model based on GraphHP (Graph Hybrid Processing). It not only inherits the BSP programming interface with the vertex as the center, but also can significantly reduce the synchronization and communication load. By establishing the hybrid execution model between the interior and the interval partition of the graph, the GraphHP realizes the pseudo super step iteration calculation, and separates the internal computation from the distributed synchronization and communication. This hybrid execution model does not need heavy scheduling algorithm or the serial algorithm can effectively reduce the synchronization and communication load. Finally, this paper evaluates the implementation of the classic BSP application in the GraphHP platform, and the experiment shows that it is more efficient than the existing BSP platform. Although the GraphHP platform proposed in this paper is based on Hama, it is easy to migrate to other BSP platforms.