随着互联网的发展和普及,越来越多的用户加入到社交网络,逐渐形成了大规模、多样化的社区。对于新浪微博等社交服务来说,这些社区的发现可以为用户和商家提供有价值的信息。在社区发现算法中,标签传播算法(LPA算法)具有算法思想简单、复杂度低、无需初始化社区数量等优点,但准确率较低,同时在大数据环境下,效率还不够高。将节点聚类系数引入LPA的标签更新过程中,提出一种结合MapReduce分布式计算框架的社区发现算法——DisLPA算法。实验表明,该算法不仅提高了准确率,同时有效改善了计算瓶颈问题。
Along with the development and popularity of Internet,more and more users join in social networks,and this gradually forms the large-scale and diverse communities. For social networking services such as Sina microblogging,the detection of these communicates can offer valuable information to users and merchants. Among numerous community detection algorithms,the label propagation algorithm( LPA) has the advantages of simple algorithm idea,low complexity,and no need in initialising the numbers of community,etc. However,its accuracy is rather lower,and meanwhile its efficiency is not high enough in the environment of big data. We proposed a community detection algorithm,which combines MapReduce distributed computation framework,by introducing nodes clustering coefficient into the process of LPA label update,we call it DisLPA. Experiment showed that the algorithm not only improved the accuracy,but also effectively solved the bottleneck problem of calculation.