为了更加准确有效地从海量的互联网网页中获取感兴趣的信息,设计并实现了一个面向互联网新闻的话题追踪与检测系统,并在该系统的基础上提出了面向海量互联网网页时话题检测中聚类算法选择策略以及一个基于多重特征的话题追踪模型,该模型能够很好地区分相似与相同的话题,并且话题追踪正确率达到了85.7%,实验结果表明文中系统能够有效地检测和追踪互联网上的话题。
In order to get information from the internet more quickly and accurately, this paper designs and implements a topic detection and tracking system over a vast amount of web pages. On the basis of the system it also proposes the algorithm selection strategy for topic detection in the intemet and a topic tracking model based on multiple features. The topic tracking model can validly distinguish the same topic from the similar topic, and the accuracy of topic tracking achieves 85.7%. The experimental results show that the system in the paper has good performance of topic detection and tracking.