在聚类分析中,决定聚类质量的关键是确定最佳聚类数,对此,从样本几何结构的角度定义了样本聚类距离和样本聚类离差距离,设计了一种新的聚类有效性指标.在此基础上,提出一种基于近邻传播算法确定样本最佳聚类数的方法.理论研究和实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合于确定样本的最佳聚类数.
It is crucial to determine optimal number of clusters for the quality of clustering in cluster analysis. From the standpoint of sample geometry, two concepts of sample clustering distance and sample clustering deviation distance are defined, and a new clustering validity index is designed. In addition, a method for determining optimal number of clusters based on affinity propagation clustering algorithm is proposed. Theoretical research and experimental results show that the proposed index and method can evaluate the clustering results effectively, and be suitable for determining optimal number of clusters.