k-均值聚类算法易受初始聚类中心的影响而陷入局部最优解.现有聚类中心初始化方法尚未得到广泛认可.本文依据每个类内至少有一个数据稠密区,且处于不同类的数据稠密区比处于同一类的数据稠密区相距更远的假设,在数据集合上构造一棵最小支撑树,应用根树原理在其上搜索数据稠密区并估计其密度,从中选出密度大且足够分离的数据稠密区,以其内的点作为初始聚类中心,得到了一个聚类中心初始化的新方法.将此方法与现有的方法进行比较,仿真实验表明,本文方法性能更优越.
The k-means clustering algorithm is prone to be trapped into local optima by inappropriate initial cluster centers. For this reason, the existing initialization methods for the cluster center have not been widely accepted. We assume that there is at least one dense subset of data in a cluster; and the dense subsets between different clusters are more distant than those in the same cluster. A minimum spanning tree is built for the given data set. The dense subsets can be found through the search from root trees, and their densities are obtained by the estimation technique for data density. The initial cluster centers are picked out from the dense subsets that are dense enough and distant enough from each other. The comparisons between the proposed method and current methods show that the performance of the proposed method is promising.