针对传统K-Means算法聚类过程中,聚类数目k值难以准确预设和随机选取初始聚类中心造成聚类精度及效率降低等问题,提出一种基于最优划分的K-Means初始聚类中心选取算法,该算法利用直方图方法将数据样本空间进行最优划分,依据数据样本自身分布特点确定K-Means算法的初始聚类中心,无需预设k值,减少了算法结果对参数的依赖,提高算法运算效率及准确率。实验结果表明,利用该算法改进的K—Means算法,运算时间明显减少,其聚类结果准确率以及算法效率均得到显著提高。
In process of clustering with traditional K-Means algorithm, it is difficult to identify the value of the number of clusters k, while the accuracy and efficiency of algorithm is reduced when it selects the cluster centers randomly. An algorithm for initialization of K-Means clustering center based on optimized division was proposed. This new algorithm could divide the data sample space optimized with histogram method, and identify the initial cluster centers obeying the natural character of data space. Experiment results demonstrate that the times of iterate in the process of K-Means algorithm is diseased clearly, and the accuracy of cluster results and efficiency of algorithm has been improved.