微博的用户关系分析是近期的研究热点,而用户的相似度计算是微博用户关系分析的基础.已有方法在发现相似用户时,主要面向关注和粉丝群体,用户微博相似度及交互相关性计算对微博的动态特性利用不够.该文提出了新颖的微博特定用户的相似用户发现方法,该方法的创新性主要体现在:(1)发现相似用户时,在关注和粉丝的基础上引入了访客类用户,扩展了已有方法局限于关注和粉丝构建自我网络(Ego Network)的模型,增加了发现相似用户的多样性;(2)根据微博动态社交的特点,提出了用户动态微博的相似度计算和动态交互相关性计算方法,以时间片为动态社交划分的基础,以指数衰减为累加策略,使得微博用户的相似度计算更为合理,发现的相似用户更为准确.以新浪微博为例,选取了学术研究、企业管理、教育、文化、军事5个领域的50个种子用户,使用S@n(前n个用户的得分)为评价指标,进行了相似用户发现的实验分析和比较.结果显示,访客类用户可以扩展相似用户的发现范围,访客在发现的相似用户中的比例为32%,动态的微博相似度和交互相关性计算方法能够改善用户相似度的计算效果,比已有的最新方法的S@n指标提高了1.3.
Recent studies focused on users' relationship on microblog,while similarity calculation of microblog users is the basis for analysis of users' relationship.Facing the problem of finding similar users,the existing methods mainly centered on followers and fans.Application of microblog dynamic characteristics was not enough when similarity between microblog and correlation among users was calculated.The work proposed a new method on discovering similar users for specific user on microblog.The method has achieved innovative points as follows:(1)Visitors were introduced to develop the Ego Network Model limited to followers and fans,with increased diversity of similar users;(2)Calculation methods were proposed for similarity between dynamic microblog of users,as well as correlation between dynamic interactions of users.It took the time slice as base for dividing dynamic social contact,and exponential damping as the accumulation strategy.It made similarty calculation among microblog users more reasonable,discovering more accurate similar users.With the case study of Sina microblog,we selected 50 seed users inacademic research,business management,education,culture and military.S@n(score of top n users)was used as evaluation index for experimental analysis and comparison among methods discovering similar users.The results showed that visitors can extend the range discovering similar users(the proportion of visitors was 32%in the all mining similar users).Meanwhile,calculation effects of users' similarity can be improved with calculation methods for dynamic topic similarity and correlation of dynamic interaction(S@n,comparing to the latest existing methods,has increased by 1.3).