舆情跟踪是对媒体信息流中的热点话题进行实时追踪,是近年来自然语言处理领域的研究热点。实现该任务的核心技术是进行文本分类,运用信息增益以及互信息计算特征项权重,提取向量空间模型中文档表示的有效特征;分别采用Rocchio、K—Nearest Neighbor(KNN)、Bayes方法对于给定主题的事件实现舆情跟踪。在测试集上的最优性能F-Measure值达到86.2%。舆情跟踪在信息安全等领域具有广阔的应用前景,为用户及时判断网络热点事件的发展趋势提供有效指导依据。
The aim of the public opinion tracking is to make tracks for the progress of the appointed hot topic in the information flow of the media, and this has becomes the hotspot research direction in the field of natural language processing in recent years. The key technique to achieve the task is text classification. The authors adopt different methods of information gain and mutual information for the feature selection within the vector space model. They are used for the weight calculation and the effective features with higher weight values are extracted. The approach of Rocchio, KNN and Bayes are adopted to implement the public opinion tracking on a given topic events. Finally, the authors give the statistical data analysis and achieve the performance of 86.2% F-Measure on the test set. Public opinion tracking has a broad application prospect in the areas of information security and so on. It provides the effective guidance for the determination to the development trend of the network hot events.