针对目前网络信息爆炸式发展的状况下,需要及时了解和掌握网上重要信息及追踪网络事件进展,给出了一种突发事件发现算法.该算法通过引入文本词语的突发度量值,考虑位置对词语权重影响等因素,提高了计算权重值的准确度.根据基于预设密度的最大链路算法,在平均半径的范围内,满足一定条件的文本集合连成一条链路,进而形成一个类簇相似的文本以类簇为类.该聚类算法在结合突发值及位置影响等因素下,能够合理的划分一段时期内的文本并归属相应的主题.实验结果表明,该算法在发现突发事件中有较好的效果.
Nowadays,for the big growth of the information on network,and the need of the grasp and track the important information or events online,we develop an algorithm for discovering bursty events.The algorithm can improve the accuracy of the calculation of weight values through the introduction of burst value of text words and the consideration of the position impact for word weights.Within the range of the average radius,the texts even extend to a link under certain conditions based on maximum link preset density algorithm.Combined with factors of burst value and position impact,this clustering algorithm can divide the texts properly in a period and attribute to the appropriate topics.The experimental results show that the algorithm has a good effect in discovering bursty events.