在微博市场营销、个性化推荐等应用中,发现兴趣和网络结构双内聚的用户社区起着至关重要的作用.现阶段,绝大多数的用户社区发现算法往往将用户联系与用户内容相隔离,从而导致其社区发现结果不够合理,而少数综合用户联系和内容的用户社区发现算法较为复杂;LCA算法是重叠社区发现算法中算法效率较高且社区质量较好的算法,然而,其在聚类时未考虑边的真实兴趣体现.针对这些问题,构建了以关注关系为网络节点、以关注关系之间是否有共同用户为关注关系潜在的边、以关注关系所关联用户的兴趣集的交集为关注关系的兴趣特征,构建微博网络R-C模型,并探讨了其进行微博用户社区发现的方法,分析了该方法的复杂度.最后,以新浪微博数据集为实验,对照节点CNM算法和LCA算法,从兴趣内聚和网络结构内聚两方面进行分析,发现该方法能够发现更好的微博用户社区.
Detecting user communities with denser common interests and network structure plays an important role in target marketing and self-oriented services. User-Generated content and the relationship between the users are often separated in the current methods on community detection, which results in the unreasonable community structures. Though some methods tried to combine the two factors, they are complex. Link community algorithm (LCA) is an efficient state-of-art method on overlapping community discovery. However, LCA does not take into account the real interest characteristics when calculating the similarity between the links. To solve the issues on user community detection on Micro-blog, this paper proposes a R-C model which takes the user relationships as the network nodes, treats the intersection of the interest characteristics of the two users in a link as the link's interest characteristics, and makes the shared user between two links as the underlying link between the links. Also, the community detection method based on the R-C model is discussed,and the complexity in clustering is analyzed. Finally, compared with node CNM and LCA, the method using R-C model is proved to be better in finding closer relationship and denser common interest user communities.