依据样本数据点分布的局部和全局一致性特征,提出了一种基于局部密度构造相似矩阵的谱聚类算法。首先通过分析样本数据点的分布特性给出了局部密度定义,根据样本点的局部密度对样本点集由密到疏排序,并按照设计的连接策略构建无向图;然后以GN算法思想为参考,给出了一种基于边介数的权值矩阵计算方法,经过数据转换得到谱聚类相似矩阵;最后通过第一个极大本征间隙出现的位置来确定类个数,并利用经典聚类方法对特征向量空间中的数据点进行聚类。通过人工仿真数据集和UCI数据集进行测试,实验结果表明本文谱聚类算法具有较好的顽健性。
According to local and global consistency characteristics of sample data points' distribution, a spectral clustering algorithm using local density-based similarity matrix construction was proposed. Firstly, by analyzing distribution characteristics of sample data points, the definition of local density was given, sorting operation on sample point set from dense to sparse according to sample points' local density was did, and undirected graph in accordance with the designed connection strategy was constructed; then, on the basis of GN algorithm's thinking, a calculation method of weight matrix using edge betweenness was given, and similarity matrix of spectral clustering via data conversion was got; lastly, the class number by appearing position of the first eigengap maximum was determined, and the classification of sample point set in eigenvector space by means of classical clustering method was realized. By means of artificial simulative data set and UCI data set to carry out the experimental tests, results show that the proposed spectral algorithm has better clustering capability.