在分析核函数所隐式描述的样本间成对相似性的基础上,该文从统计的角度分别定义了能反映类内(类间)样本相似性的类内(类间)个体平均相似系数,设计了一个高效的评价核聚类算法聚类质量的有效性指标。该指标具有物理意义清晰、计算简洁以及对核参数具备一定鲁棒性的优点。在此基础上,提出了一个能自动确定最佳聚类数目和最佳划分的自适应核聚类(SAKC)算法。Benchmarks实验结果验证了所提出的聚类有效性指标及其SAKC算法的有效性和良好性能。
By investigating the inherent pairwise similarities implicitly defined by the kernel function, this paper defines two statistical similarity coefficients, natured as within-cluster and between-cluster average similarity coefficient, which can be used to describe the internal and external similarity between the data items, respectively. And then, an efficient validity index for kernel clustering algorithm is proposed, which has distinct physical meanings, less computational complexity and a certain robustness with respect to Gaussian kernel width. In addition, a self-adaptive kernel clustering (SAKC) algorithm based on the proposed validity index is also developed. The benchmark results demonstrate the effectiveness and performance of the new validity index of SAKC algorithm.