传统的基于文本聚类的网络舆情热点追踪算法,在处理海量网页时,文本聚类速度过低,聚合结果较差.提出了一种基于关键词提取的网络舆情热点追踪方案,并根据新闻、论坛和博客的不同特点分别设计了热点分析模型.通过在笔者开发的啄木鸟网络舆情系统上的实际验证表明,该方案行之有效,热点分析模型识别热点准确率高.
Based on text clustering, the traditional hot spot of Internet public opinion tracing algorithm clusters slowly very much. The results of clustering are poor in dealing with massive web pages. This paper shows a hot spot of Internet public opinion tracing scheme based on keyword extraction, and according to the different features of news, BBS, Blog designs hot topic analysis models respectively. Through the experiments on the woodpecker Internet public opinion system developed by us, it shows that the scheme can be effective, and the hot spot analysis model recognizes the hot spots with high accuracy.