为降低算法复杂性以及改善其普适性,提出了一种时间敏感的新型话题检测技术。该技术抽取微博内容,根据一个新型老化理论建立了词语生命周期模型,来挖掘最新出现的术语。若一个词组在特定的时间段出现频率高、而在过去一段时间内未出现,可表示为突发事件出现。此外,考虑内容重要性也取决于其来源,使用通用的Page Rank算法分析社交网络关系,以确定用户的权威性。结合用户权威性以及突发词组实现在用户假定时间限制下的热点话题检测。在新浪微博数据集上的多个实验结果表明,该算法能够高效识别出特定时间段内的热点话题。
A novel topic detection technique proposed in this paper.First,the contents of the tweets are extracted and the term life cycle is modeled according to a novel aging theory intended to mine the emerging ones.A term can be defined as emerging if it frequently occurs in the specified time interval and it was relatively rare in the past.Moreover,considering that the importance of content also depends on its source,we analyze the social relationships in the network with the well-known Page Rank algorithm in order to determine the authority of the users.Finally,an algorithm which combines the authority and the emerging terms is used to detect the new topic.We provide different case studies which show the validity of the proposed approach.