随着网络信息技术的迅速发展,互联网已经成为人们获取和发布信息的最重要平台之一.在互联网的信息传播过程中,话题相关文本不断更新,而其内容焦点也随着话题发展发生着迁移.识别话题内容焦点有助于有效地挖掘与分析网络信息,是网络舆情分析领域的重要研究问题.文中针对网络流文本,提出了一种网络话题内容焦点的识别方法,首先对话题焦点特征在流文本中的分布情况进行分析,基于分析结果介绍了焦点识别方法3个主要步骤的算法模型,分别是基于时间属性的焦点特征词提取、内容焦点特征词的合并和内容焦点的表示.文本基于来自于真实网络的实际数据,对所提方法进行了实验验证,实验结果表明文中所提方法可有效获取话题发展过程中的内容焦点,并能以关键词集和语句集的形式对内容焦点进行表示.
With the rapid development of information technology, Internet has become one of the most important platforms for people to get information. In our daily life, people tend to encounter this phenomenon: the news of certain topic is constantly updated, but the reports focus on different contents, these different focus include: the focus generated by the development of events l new focus caused by the increased user reviews of topies; focus migration due to the impact of other hot news topic in the same period. In this paper, we propose a method for analyzing and identifying the evolutionary focus of topics. The method is consist of three parts, including feature selection based on time attribute, feature combine model and focus presentation. The experimental results show that this method could identify the evolutionary focus of topics effectively.