利用数据点的密度堆积起来的山脉能反映数据的结构,从而催生了山峰聚类(Mountains Clustering).遗憾的是,目前的山峰聚类算法深受数据分布结构的影响.提出一个新的聚类方法,称为分割-合并聚类算法(divisionjoin clustering framework,DJCF),它能发现由密度堆积的整个山脉中所有的山峰,然后将这些山峰根据彼此之间的关系进行合并,得到的结果对应最终的聚类.通过由两个阶段组成的一个流程,DJCF算法能对任何形状和分布的数据进行聚类.算法第一个阶段的目的是将数据集分割成多个划分(partition),真正的类由若干个划分组合而成.在这个阶段中利用K-近邻(KNN)设计了一种密度计算方式,然后将新密度计算方式运用到Cluster-dp算法中,使用了新密度计算方式的Cluster-dp算法能更准确地找到数据集的划分.算法的第二个阶段是将找出来的划分根据彼此之间的关系组合成最终的聚类.在人工数据和实际数据中的实验验证了该算法的简单和有效性.
The mountain,which heaps up by densities of data points,intuitively reflects the structures of data set.Unfortunately,the previous mountain-based clustering methods suffer from the layout of the hills.We proposed a new clustering framework,the division-join clustering framework(DJCF).It can find out almost all of the hills and merge them into the mountains which correspond to the clusters.Through two phases,the DJCF algorithm can recognize clusters regardless of their shape and the layout of the data points.The aim of the first phase is to divid the data set into some partitions which the real clusters are consisted of.In this phase,we take advantage of K-nearest neighbor to design a new way to compute density and arm the Cluster-dp algorithm with the new density.As a result,it can be more accurate to get the partitions of the data set.In the second phase,the partitions are merged into the resulted clusters depending on the relations of partitions.Experiments on artificial data sets and real data sets demonstrateour new approach's simplicity and effectiveness.