谱聚类算法是近年来机器学习领域的研究热点,它基于代数图论,可以有效地解决很多实际问题.但是传统的谱聚类算法无法很好地处理高维数据,容易受到噪声和不相关属性的干扰.为了降低计算复杂度,同时减弱噪声数据和冗余属性对聚类的负面影响,提出了一种基于邻域粗糙集约减的谱聚类算法(NRsR—SC).该算法将信息熵引入到邻域粗糙集中,在保持样本区分能力的前提下,去除冗余的属性,保留对聚类贡献最大的属性;然后基于约简后的属性集合,计算样本点之间的相似度,构造相似性矩阵和拉普拉斯矩阵;最后利用谱方法得到最终的聚类结果.实验表明,NRSR—SC算法在处理高维数据时,具有较强的抗干扰能力,其运行效率和准确率都有明显改善.
Spectral clustering algorithm is a hot research field of machine learning in recent years. It is based on algebraic graph theory and can effectively solve many practical problems. However, suffering from the interference of noise and irrelevant attributes, traditional spectral clustering algorithm does not work well on high-dimensional data. In order to reduce the computational complexity and weaken the negative impact of noise data and redundant attributes on clustering, this paper proposes a spectral clustering algorithm based on neighborhood rough sets reduction(NRSR-SC). Information entropy is in troduced into the neighborhood rough sets in this algorithm,so that redundant attributes can be removed and the attributes making the greatest contribution to clustering can be reserved, under the premise of maintaining the ability to distinguish different kind of samples. Then, based on the reduced attribute collection, the similarities between sample points are calculated to construct the affinity matrix and Laplaeian matrix. At last,we use spectral method to get the final clustering results. Experiments show that, when dealing with high-dimensional data, NRSR SC algorithm has a strong anti-jamming ability and the efficiency and accuracy has improved significantly.