为了克服在传统K—means聚类算法过程中因初始类簇中心的随机性指定所带来的聚类结果波动较大的缺陷,提出了一种基于密度与最小距离作为参数来确定初始类簇中心的算法。该算法根据一定的规则计算数据对象的密度参数,在计算完数据集中每条数据的单点密度之后,计算每个数据对象与较其密度大的其他数据对象的最小距离,以密度和最小距离作为参数,选取密度和最小距离同时较大的点作为K—means聚类过程的初始类簇中心。实验结果表明,在类簇数目确定的情况下,应用该算法确定的初始K—means类簇中心,在标准的UCI数据集上能够进行K-means聚类,且与随机选择类簇中心和其他使用密度作为参数的算法相比,基于改进后的初始中心方法的K-means聚类算法具有较高的准确率和更快的收敛速度。
In order to overcome a large fluctuation caused by the traditional K -means algorithm clustering with assignment of the random initial cluster centers, an algorithm taken the density and minimum distance as the parameters to determine the initial cluster centers is pro- posed, which calculates the density parameter of the data object according to certain rules and minimum distance between each data object and other data objects after having calculated single point density of each data in the data set. The larger one among the densities and min- imum distances has been chosen as initial cluster center in the process of K-means clustering. Experimental results show that it has higher accuracy and faster convergence rate compared with ones using randomly selected cluster centers and using density as a parameter for K - means clustering on standard UCI data set.