东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

分块主成分分析在文本特征抽取中的应用

ISSN号：1672-6871
期刊名称：《河南科技大学学报：自然科学版》
时间：0
分类：TP[自动化与计算机技术]
作者机构：西南交通大学信息科学与技术学院,成都611756
相关基金：国家科技支撑计划课题（2015BAH19F02）,国家自然科学基金（61262058,61572407）,教育部在线教育研究中心在线教育研究基金（全通教育）（2016YB158）,西南交通大学中央高校基本科研业务费专项基金（A0920502051515-12）资助

关键词：聚类集成, 近邻传播, 密度峰值, 相似性矩阵, Clustering ensemble, affinity propagation, density peaks, similarity matrix

中文摘要：

聚类集成的目的是为了提高聚类结果的准确性、稳定性和鲁棒性.通过集成多个基聚类结果可以产生一个较优的结果.本文提出了一个基于密度峰值的聚类集成模型,主要完成三个方面的工作：1）在研究已有的各聚类集成算法和模型后发现各基聚类结果可以用密度表示;2）使用改进的最大信息系数（Rapid computation of the maximal information coefficient,Rapid Mic）表示各基聚类结果之间的相关性,使用这种相关性来衡量原始数据在经过基聚类器聚类后相互之间的密度关系;3）改进密度峰值（Density peaks,DP）算法进行聚类集成.最后,使用一些标准数据集对所设计的模型进行评估.实验结果表明,相比经典的聚类集成模型,本文提出的模型聚类集成效果更佳.

英文摘要：

Clustering ensemble aims to improve the accuracy, stability and robustness of clustering results. A good ensemble result is achieved by integrating multiple base clustering results. This paper proposes a clustering ensemble model based on density peaks. First, this paper discovers that the base clustering results can be expressed with density after studying and analyzing the existing clustering algorithms and models. Second, rapid computation of the maximal information coefficient （RapidMic） is introduced to represent the correlation of the base clustering results, which is then used to measure the density of these original datasets after base clustering. Third, the density peak （DP） algorithm is improved for clustering ensemble. ~rthermore, some standard datasets are used to evaluate the proposed model. Experimental results show that our model is effective and greatly outperforms some classical clustering ensemble models.

同期刊论文项目