以微博、论坛等为代表的社交媒体已逐渐发展成为网络用户表达和交流观点、获取和传播信息的重要平台.然而,社交媒体文本内容具有的规模庞大、形式多样、传播迅速等特点,对传统的应用在新闻报道、舆情监控、文本挖掘、信息咨询等方面的话题检测与追踪技术提出了新的要求.针对这一背景,本文分别从离线话题检测、在线话题检测和话题演化追踪这三个方面总结当前主要的话题检测与追踪方法,分析在该领域实验中被普遍使用的评估方式,最后提出当前面临的挑战和今后的研究方向.
Social media, like microblog, has gradually become a key platform for users to express and exchange views, acquire knowledge and disseminate information. However, social media text streams are usually voluminous, diversified and fast-spreading, posing new challenges for topic detection and tracking in traditional news media. This article discusses the related studies of topic detection and tracking in social media text, and classifies these studies into three main categories: offline topic detection, online topic detection, and topic evolutionary tracking. Then the widely used evaluation metrics in this field are introduced briefly as well. Finally, we summarize the major limitations of cur- rent works, and also outline directions for future research.