传统的增量k均值法用于事件探测时存在着诸多不足。为了克服其缺陷,本文提出了一种用于事件探测的改进的增量k均值算法(IIKM)。该算法使用密度函数法进行聚类中心的初始化以便客观地选择初始聚类中心,既可以用于在线探测也可以用于回溯探测,并且执行结果受新闻语料被处理顺序的影响较小。本文对有效密度半径和特征空间维数的选择问题进行了讨论,并比较了该方法和Single—pass法及传统的K均值法的性能差异。实验结果表明本文所提出的方法是有效的。
There are lots of drawbacks to traditional incremental K-means in event detection. In order to overcome its shortcomings, this paper proposed an improved incremental K-means (IIKM) for detecting events. The algorithm utilizes density function to initialize cluster centers to select initial cluster centers objectively, it can be used in both on-line detection and retrospective detection, and the quantity of clusters is affected little by the order in which the news stories are processed. The problems of effective density radius selection and feature space dimension selection are discussed, and the performance difference between this method and other methods including Single-pass and traditional K - means. The result of experiments indicates the proposed algorithm is feasible.