重点研究事件检测模型中层次聚类算法的改进,提出利用在关键词抽取基础上利用新闻的各种要素信息计算新闻之间相似度的方式,搭建了一个在线新闻检索系统,在其上利用新华社的新闻语料进行实验。实验结果表明改进方法的效果明显,性能较之未使用前有显著的提升。
This paper proposed an algorithm utilizing several methods to address the problem. It used keyword extraction to reduce the vector space model of the news and selected the keyword that could represent the news story mostly. It proposed a hierarchical clustering algorithm utilizing news metadata for similarity computing in a unified framework. Furthermore, based on this approach, it built an on-line news search system, which provided functions to organize news data into news event, and furthermore provided personalized Service for users. Experimental results on news data from Xinhua News agency show that both the proposed approaches can effectively improve the performance of RED task, compare to the baseline method.