针对尺度参数选取对使用高斯核函数的传统谱聚类算法性能的影响,提出一种以近邻自适应局部尺度代替全局统一尺度的新谱聚类算法。该算法在数据聚类一致性特征的基础上,首先强调局部尺度的灵活性,即每个样本数据对应一个尺度参数,克服了传统方法中所有样本对应单一全局尺度参数的局限性,更好地刻画数据集的本征结构。其次注重参数选取的便捷性,即通过对样本周围Ⅳ个近邻计算加权距离和作为局部尺度的值,从而实现了尺度参数的自动选取。从理论和实验两个角度阐述该算法不仅对离群点有一定的抑制作用,而且能对尺度分布不同的数据类进行准确聚类。最后,在人工数据集和UCI数据集上验证了该算法的有效性。
Considering the performance of traditional spectral clustering using Gaussian kernels, a new spectral clustering based on neighboring adaptive local scale is presented in this paper. Based on clustering consistency characteristics, the proposed method first emphasizes the flexibility of the local scale, which means each sample has a corresponding scale parameter. Furthermore, it overcomes the limitations of traditional methods in all samples with the same global scale parameter.. Hence, it can depict the intrinsic structure of data sets better. Second, it stresses the convenience of parameter selection. It can determine the value of a local scale for one sample by computing the sum of weighted distances of N neighbors. Therefore, it can determine the scale parameter automatically. This paper illustrates the proposed algorithm not only has inhibition for certain outliers but is able to cluster the data sets with different scales. Finally, experiments on both, artificial data and UCI data sets, show that the proposed method is effective.