由于K-means算法初始聚类中心的选取具有随机性,聚类结果可能不稳定,导致Gap统计估计的聚类数也可能不稳定。针对这些不足,提出一种改进的近邻传播算法-mAP。该算法考察数据的全局分布特性,不同的点赋予不同的P值。在Gap统计中用mAP算法代替K—means算法,提出基于mAP的Gap统计mAPGap。mAP能在较短的时间内得到较好的聚类效果,而且不需要预先设定初始聚类中心,聚类结果更稳定。实验结果表明,mAPGap在估计聚类数的稳定性和聚类精度上都优于原Gap。
Due to the randomness of choosing the initial clustering of K-means method, it may cause the instability of clustering results and then lead to that of clustering numbers which are estimated by Gap statistic. Taking consideration of those disadvantages, an modified AP clustering (mAP) is presented which utilizes the global distribution to give different P to different points, mAP method is put forward to substitute the K -means in Gap statistic named mAPGap, mAP method has more stable clustering center because the initial clustering center and numbers are not needed in advance and it can get better clustering in short time. The experimental results demonstrate mAPGap is superior to Gap in clustering stability and accuracy.