在传统的k-means聚类算法中,聚类结果会随着初始聚类中心点的不同而波动,针对这个缺点,提出一种优化初始聚类中心的算法。该算法通过计算每个数据对象的密度参数,然后选取k个处于高密度分布的点作为初始聚类中心。实验表明,在聚类类别数给定的情况下,通过用标准的UCI数据库进行实验比较,发现采用改进后方法选取的初始类中心的k-means算法比随机选取初始聚类中心算法有相对较高的准确率和稳定性。
The traditional k-means has sensitivity to the initial clustering center.Considering this defection,a new improved algorithm is proposed.In the new algorithm,the density parameter of every data object is computed,and then k data objects with high density parameter are chosen as the initial clustering centers.Given the cluster number,and UCI database is used as testing datasets.The clustering results demonstrate that the improved algorithm can enhance the clustering stability and accuracy of ordinary k-means algorithm relatively.