东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于簇边界的密度峰值点快速搜索聚类算法

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]山东科技大学计算机科学与工程学院,青岛266590, [2]山东省智慧矿山信息技术重点实验室,青岛266590
相关基金：国家自然科学基金（61203305,61433012）;山东省重点研发计划（攻关）（2016GSF120012）;山东省自然科学基金（ZR2015FM013）;山东省“泰山学者”攀登计划

关键词：密度峰, 聚类中心, 噪声清除, 聚类, density peaks, cluster centers, noise cleaning, clustering

中文摘要：

相比其它聚类算法,密度峰值点快速搜索聚类算法（clustering by fast search and find of density peaks,DPC）只需较少的参数就能达到较好的聚类结果,然而当某个类存在多个密度峰值时,聚类结果不理想.针对这一问题,提出一种基于簇边界划分的DPC算法：B-DPC算法.改进算法首先利用一种新的去除噪声准则对数据集进行清理,再调用DPC算法进行首次聚类.最后搜索并发现邻近类的边界样本,根据边界样本的数量和所占比例,对首次聚类结果进行二次聚类.实验证明,B-DPC算法较好地解决了多密度峰值聚类问题,能够发现任意形状的簇,对噪声不敏感.

英文摘要：

In data mining community,clustering is one of the most important research topics because of the complexity and nonsupervisory of data.A great deal of techniques are devoted to the study of data clustering algorithms.A paper titled with clustering by fast search and find of density peaks（DPC）was proposed in Science journal,which focused on density-based clustering.Compared with other clustering algorithms,DPC only uses less parameters but can obtain better clustering results.However,when there exist multi density peaks in a cluster,the clustering results are not satisfactory.For this reason a boundary partition-based DPC algorithm,B-DPC,is proposed.B-DPC algorithm improves the standard DPC from two aspects：a criterion of cleaning noisy data and the data clustering processes with two rounds.A new criterion how to judge whether a data instance is a noise is defined by calculating the distances among all data instances.A data instance can be viewed as a noise if the distances between this instance and all noisy data instances in noisy dataset are less than a predetermined threshold.Such noisy data instances are firstly cleaned from dataset,and then B-DPC begins to implement a two-round process.The first-round process isto apply the standard DPC to choose some latent cluster centers.Then some initial clusters can be obtained and the decision graph can be built.The second-round process is to combine those similar clusters into more actual count of clusters,which is implemented by finding boundary data instances,the count of these boundary instances and the ratio of the boundary instances to the near clusters.In order to test the B-DPC algorithm,some classical artificial datasets and real-world datasets are applied to our experiments.And several well-performed clustering algorithms,such as DPC,DBSCAN,K-means,are also used as comparing clustering methods.Experimental results show that B-DPC can solve the multi density peaks problem effectively,and also discover the clusters with arbitrary shapes.

同期刊论文项目