针对拖网算法存在的发现web社区数量过多、社区间页面重复率较高以及严格的社区定义形成孤立社区等问题,提出一种基于形式概念分析(FCA)的博客社区发现算法。根据博客网络之间的链接关系构造概念格,通过格的代数消解对原始概念格进行等价划分,度量每个划分中概念间外延和内涵的结构相似性进而合并社区核心形成社区。实验结果表明:测试数据集中社区核心的网络密度大于40%的占全部的83.420%,合并社区的网络直径为3,且社区内容丰富程度得到提高。所提算法可以有效地运用于博客、微博等社交网络的社区发现,具有显著的应用价值和现实意义。
Several problems exist in trawling algorithm, such as too many Web communities, high repetition rate between community-cores and isolated community formed by strict definition of community. Thus, an algorithm detecting Blog community based on Formal Concept Analysis (FCA) was proposed. Firstly, concept lattice was formed according to the linkage relations between Blogs, then clusters were divided from the lattice based on equivalence relation, finally communities were clustered in each cluster based on the similarity of concepts. The experimental results show that, the community-cores, which network density is greater than 40%, occupied 83. 420% of all in testing data set, the network diameter of combined community is 3, and the content of community gets enriched significantly. The proposed algorithm can be effectively used to detect communities in Blog, micro-Blog and other social networks, and it has significant application value and practical meaning.