东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种分割-合并聚类算法

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]北京交通大学计算机与信息技术学院,交通数据分析与挖掘北京市重点实验室,北京100044, [2]中央司法警官学院信息管理系,保定071000
相关基金：国家自然科学基金（61033013,61370129,61375062,61300072,61105056,61402462）; 高等学校博士学科点专项科研基金（20120009110006）; 河北省教育厅青年基金（QN2015099）; 河北省社会科学基金（HB15TQ013）

关键词：聚类, 分割-合并方法, 山峰聚类, Cluster-dp算法, clustering, division-join method, mountain clustering, Cluster-dp algorithm

中文摘要：

利用数据点的密度堆积起来的山脉能反映数据的结构,从而催生了山峰聚类（Mountains Clustering）.遗憾的是,目前的山峰聚类算法深受数据分布结构的影响.提出一个新的聚类方法,称为分割-合并聚类算法（divisionjoin clustering framework,DJCF）,它能发现由密度堆积的整个山脉中所有的山峰,然后将这些山峰根据彼此之间的关系进行合并,得到的结果对应最终的聚类.通过由两个阶段组成的一个流程,DJCF算法能对任何形状和分布的数据进行聚类.算法第一个阶段的目的是将数据集分割成多个划分（partition）,真正的类由若干个划分组合而成.在这个阶段中利用K-近邻（KNN）设计了一种密度计算方式,然后将新密度计算方式运用到Cluster-dp算法中,使用了新密度计算方式的Cluster-dp算法能更准确地找到数据集的划分.算法的第二个阶段是将找出来的划分根据彼此之间的关系组合成最终的聚类.在人工数据和实际数据中的实验验证了该算法的简单和有效性.

英文摘要：

The mountain,which heaps up by densities of data points,intuitively reflects the structures of data set.Unfortunately,the previous mountain-based clustering methods suffer from the layout of the hills.We proposed a new clustering framework,the division-join clustering framework（DJCF）.It can find out almost all of the hills and merge them into the mountains which correspond to the clusters.Through two phases,the DJCF algorithm can recognize clusters regardless of their shape and the layout of the data points.The aim of the first phase is to divid the data set into some partitions which the real clusters are consisted of.In this phase,we take advantage of K-nearest neighbor to design a new way to compute density and arm the Cluster-dp algorithm with the new density.As a result,it can be more accurate to get the partitions of the data set.In the second phase,the partitions are merged into the resulted clusters depending on the relations of partitions.Experiments on artificial data sets and real data sets demonstrateour new approach＇s simplicity and effectiveness.

同期刊论文项目