伴随大数据的涌现,云存储和计算技术近年得到长足发展.图数据是一种重要而普遍的大数据,在生物信息学、社会网络、化学信息学等领域都有众多应用.因此,大图计算作为大数据分析应用的典型代表,正成为云端负载的重要组成部分.目前,高可扩展性的图计算主要依赖于高性能计算解决方案,需要进行环状(或网状)计算机网络之上的高效全集合通信.然而,在通用计算集群和云计算基础设施上实现基于环状计算机网络的算法时,低效的网络通信将导致巨大的系统延迟.因此,这就要求那些基于云端的大数据计算平台和系统具备十分良好的水平可扩展性.但是,大图的幂律分布和缺乏局部性使得设计一套高度可扩展的大图计算系统变得更具挑战.为此,文中提出了一种面向通用计算集群的可扩展大图计算模型.专注于水平扩展能力,设计了一种新颖的基于分离器-合并器BSP的图计算方法,能够提供原生的负载平衡,仅需很低的通信开销.从而,图数据规模的增大可以通过增加计算节点数量得以解决.最后,在一个图数据通用测试集上,通过大量实验验证了所提模型和方法的有效性和高效性;结果显示,相比经典的以顶点为中心的BSP大图计算模型和其他主流大图计算系统,所提改进的基于BSP的大图计算模型能够提供更好的水平可扩展性.
Along with the emergence of big data,cloud storage and computing technology lately has been developing rapidly.Graph data is an important and pervasive type of big data,which find a wide spectrum of applications,including bio-informatics,social networks,chem-informatics,etc.Massive graph computation,as a typical representative of big data analytics and application,is becoming an important part of workloads on the cloud.Currently,scalable graph computation mainly resorts to high performance computing solutions,which requires high performance all-toall collective communication over torus or mesh networks.However,implementing these torus or mesh-based algorithms on commodity clusters and cloud computing infrastructures may result in high latency,due to inefficient network communication.Thus,this requires those cloud-based platforms and systems to be of good scale-out capabilities.However,the vertex degree skewness and lack of locality on massive graphs further challenge the design of a highly scalable system.Toresolve the issues,we exploit a commodity cluster-oriented programming model for scalable graph computation.Focusing on scale-out capability,we propose a novel separator-combiner BSP-based graph computation method,which provides native load-balancing and low communication overhead.Thus,larger graphs can be addressed by adding more computing nodes.Finally,on a general testing benchmark for big graph data,extensive experiments confirm the effectiveness and efficiency of the proposed model and method.The results demonstrate that the proposed revised BSP-based massive graph computation model provides better scale-out capability,in comparison with the classic vertex-centric BSP-based graph computation model and other major massive graph commutating systems.