东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于邻域粗糙集约减的谱聚类算法

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP391.41[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国矿业大学计算机科学与技术学院,徐州221116, [2]中国科学院计算技术研究所智能信息处理重点实验室,北京100190
相关基金：基金项目：国家重点基础研究发展计划（2013CB329502）,国家自然科学基金（61379101,51104157）

关键词：邻域粗糙集, 信息熵, 属性约简, 谱聚类, neighborhood rough sets, information entropy, attribute reduction, spectral clustering

中文摘要：

谱聚类算法是近年来机器学习领域的研究热点，它基于代数图论，可以有效地解决很多实际问题．但是传统的谱聚类算法无法很好地处理高维数据，容易受到噪声和不相关属性的干扰．为了降低计算复杂度，同时减弱噪声数据和冗余属性对聚类的负面影响，提出了一种基于邻域粗糙集约减的谱聚类算法（NRsR—SC）．该算法将信息熵引入到邻域粗糙集中，在保持样本区分能力的前提下，去除冗余的属性，保留对聚类贡献最大的属性；然后基于约简后的属性集合，计算样本点之间的相似度，构造相似性矩阵和拉普拉斯矩阵；最后利用谱方法得到最终的聚类结果．实验表明，NRSR—SC算法在处理高维数据时，具有较强的抗干扰能力，其运行效率和准确率都有明显改善．

英文摘要：

Spectral clustering algorithm is a hot research field of machine learning in recent years. It is based on algebraic graph theory and can effectively solve many practical problems. However, suffering from the interference of noise and irrelevant attributes, traditional spectral clustering algorithm does not work well on high-dimensional data. In order to reduce the computational complexity and weaken the negative impact of noise data and redundant attributes on clustering, this paper proposes a spectral clustering algorithm based on neighborhood rough sets reduction（NRSR-SC）. Information entropy is in troduced into the neighborhood rough sets in this algorithm,so that redundant attributes can be removed and the attributes making the greatest contribution to clustering can be reserved, under the premise of maintaining the ability to distinguish different kind of samples. Then, based on the reduced attribute collection, the similarities between sample points are calculated to construct the affinity matrix and Laplaeian matrix. At last,we use spectral method to get the final clustering results. Experiments show that, when dealing with high-dimensional data, NRSR SC algorithm has a strong anti-jamming ability and the efficiency and accuracy has improved significantly.

同期刊论文项目