针对传统微博社区发现算法内聚低重叠度不可控制等问题,以自顶向下的策略,提出一种基于核心标签的可重叠微博社区发现策略Tag Cut.先利用用户标签的共现关系及逆用户频率对标签进行加权,并基于标签之间的内联及外联关系并将用户的标签进行扩充,然后在整体社区中提取包含某一标签的用户作为临时分组并利用评价函数评估划分的优劣,最后选出最合适的核心标签根据其对应分组与其他分组距离的远近来决定将其划分为新的分组还是并入其他分组.用此策略反复迭代直到满足要求.该算法划分的组由若干个拥有核心标签的分组组成且综合利用微博用户已声明的及隐含的兴趣、用户之间的关注规律、结果的实用性对划分结果进行修正.经真实数据实验表明该方法内聚高社区重叠度可控且拥有实际意义.
The traditional microblog community detection algorithm has the characteristic of low coupled clustering and the overlapping degree can not be controlled. In this paper, we present a divisive approach for overlapping microblog community detection algorithm via core tags. Firstly,the key idea is to develop a tag weighing strategy by taking advantage of the co-occur- rence of tags and inverse user frequency. Then tag correlation can be exploited,which investigates both inter and intra correlation of tags ,and the tags for users can therefore be expanded. Users containing certain tag in the whole community are extracted as a temporary group and the quality value is calculated under the current partition. The most appropriate core tag is selected and the corresponding group is then updated until certain requirements are satisfied. The commtmity detected by this algorithm share com- mon core tags and the partition results can be revised based on the explicit and implicit interest of users ,together with the users' attention and practical application. Experimental results show that the method is effective and has practical significance.