CFSFDP是基于密度的新聚类算法,可聚类非球形数据集,具有聚类速度快、实现简单等优点。CFSFDP需人工尝试确定密度阈值dc,且对一个类中存在多密度峰值的数据无法进行准确聚类。为解决该缺点,提出基于近邻距离曲线和类合并优化CFSFDP(简称NM-CFSFDP)的聚类算法。算法用近邻距离曲线变化情况自动确定密度阈值dc,采用确定dc的CFSFDP对数据聚类,并利用计算dc值的方法指导类的合并,引入内聚程度衡量参数解决了类合并后不能撤销的难题,从而实现对多密度峰值数据的正确聚类。通过实验对比,NM-CFSFDP算法确实比CFSFDP算法具有更加精确的聚类效果。
CFSFDP algorithm is a new clustering algorithm based on density, which cluster non-spherical data sets. CFSFDP has the advantages of fast clustering speed and simple realization. But the CFSFDP algorithm needs to perform multiple attempts to determine the density threshold dc and the existence of multiple density peaks of one class leads to incorrect cluste- ring. In view of the disadvantages, this paper proposed optimization of CFSFDP based on neighbor distance curve and merging clusters (for short NM-CFSFDP) algorithm. Firstly, the new algorithm gave the density threshold which named dc automatical- ly, the dc was determined by the change of the nearest neighbor distance curve. Secondly, NM-CFSFDP used CFSFDP algo- rithm, which gave dc automatically, to cluster the data set, and then merged the classes that could be merged, and the merging operation could be dynamically revoked in the algorithm. Through the contrast experiment, the NM-CFSFDP algorithm is more accurate than the CFSFDP in clustering.