在如何快速发现大规模网络的结构和特性问题中,网络规模及复杂度的快速增长给其分析研究带来了新的挑战.MapReduce及其开源实现Hadoop给大规模图的高效处理带来了希望.基于MapReduce框架的集群系统,提出了1种新的计算模型用于大规模图形的3-clique计算,来实现图挖掘.计算的基本步骤是:首先获取每个节点的第1跳信息,然后是第2跳信息,最后得到所有基于该节点的3-clique.该计算模型可以用来计算聚集系数,并且可以用于三大通话网络的挖掘.实验结果证明这种计算模型具有良好的可扩展性和性能.
Large-scale graphs exist everywhere. The continued exponential growth in both the size and complexity of the graphs is posing a new challenge for fmding the structures and characters of a large-scale graph. An excellent promising clue for dealing with graphs with great sizes is the emerging MapReduce framework and its open-source implementation, Hadoop. The problem of 3-clique enumeration of a graph is an important operation that can help structure mining and a difficult mission for graphs with great sizes on the single computer. In this paper, we propose a parallel computing model for 3-clique enumeration based on cluster system with the help of MapReduce for large-scale graphs. The process of enumeration is firstly to extract one-leap information of the graph, then the two-leap information and finally, the key-based 3-clique enumeration. Also, we apply the computing model to the computation of clustering coefficient. The computing model is applied to three real-world large CALL graphs and the results of the experiments manifest the good scalability and efficiency of the model.