随着微博的高速发展,微博信息溯源成为信息内容安全管理的重要研究内容之一。此前的信息溯源研究中,事件数据纯度不高,且用户影响力涉及的因素考虑得不够全面,本文即针对以上问题进行研究。首先,由于微博信息采集过程中,得到的数据纯度较低,对大量的新浪微博数据进行事件聚类,得到某一具体事件的相关微博。接着,为了得到更合理的用户影响力数值,针对具体事件涉及到的用户,考虑多个因素,进行用户影响力分析,得到用户的影响力数值。最后,综合考虑微博的发布时间和用户影响力数值,使用Hacker News排序算法,对具体事件进行信息溯源。最终分析得出了事件传播过程中,发布时间较早,且用户影响力较大的一篇微博为事件源头。
Microblog information tracing is one of the most important research content in information content security administration field as the microblog is rapidly developing. In the previous information tracing research, the purity of event data is lower, and the factors about user influence is not fully considered. To address the aboved problems, firstly, the purity of data is lower in the process of the microblog information collection, so the paper proposes event clustering on plenty of Sina microblog data to get microblog data about a specific event. Next, considering many factors on a specific event to finish user influence analysis, the paper gets more advisable user influence value. Finally, comprehensively combined with microblog post time and user influence value, the paper uses Hacker News sorting algorithm to realize information tracing on the specific event. The experiment results show that both earlier post time and user influence are key featm'es to trace the source of the event.