数据挖掘中如何根据数据之间的相似度确定簇( Cluster)数一直是聚类算法中需要解决的难题。文中在经典谱聚( Spectral Clustering)算法的基础上提出了一种基于特征间隙检测簇数的谱聚类算法( Spectral Clustering with Identifying Clustering Number based on Eigengap,SC-ICNE)。通过构建规范的拉普拉斯矩阵,顺序求解其特征值和相应特征向量,并得到矩阵相邻特征值的间隙,通过判断特征间隙的位置来确定簇数k。最后,通过对前k个特征向量的k-means算法实现数据集的聚类。文中通过仿真分析了高斯相似度函数对SC-ICNE聚类性能的影响,在非凸球形数据集和UCI数据集上进行了性能仿真,并和k-means聚类算法进行了对比,在检测簇数和聚类准确性方面,验证了SC-ICNE算法的有效性。
Choosing the number k of clusters based on the degree of correlation is a general problem for all clustering algorithms. Based on the classical spectral clustering algorithm,propose a Spectral Clustering with Identifying Clustering Number based on Eigengap ( SC-IC-NE) algorithm. The SC-ICNE algorithm computes eigenvalues and corresponding eigenvectors of normalized graph Laplacians sequen-tially. Furthermore,the number of cluster k can be identified via the eigengap between the adjacent eigenvalues. Finally,the data can be clustered using the first k eigenvectors with the k-means algorithm. In the simulation,the effect of the Gaussian similarity function on the cluster performance of SC-ICNE is discussed,and compare the cluster performance of SC-ICNE with the k-means algorithm in non-spherical convex data set and the UCI data set. Simulation results show that the SC-ICNE algorithm achieves high performance in terms of clustering accuracy and identifying the cluster number.