研究了K均值算法中初始聚类中心的选择对算法本身聚类精度及效率的影响,并提出了改进的算法(LK算法,Leader+K—means)。LK算法中的初始聚类中心选择不是随机的,而是利用Leader算法得到若干个初始类中心,然后选择包含数据项最多的k个类中心,作为K均值算法的初始类中心。实验结果表明,LK算法在聚类结果的稳定性和正确率方面都是有效可行的。
By researching in the relations between the initial means of clusters and the efficiency of clustering, the improved K - means clustering algorithm ( the LK algorithm, Leader + K - means) is proposed. The LK algorithm is better since the initial means is not random selected. At first, it gains several initial means by means of the Leader algorithm, and then selects the k means containing the most data items regarded as the initial means. According to the experiment, the improved K -means clustering algorithm can get higher stability and accuracy .