CFSFDP指定全局密度阈值dc时未考虑数据空间分布特性,导致聚类质量下降,且无法对多密度峰值的数据集准确聚类。针对以上缺点,提出一种基于投影分区及类合并技术优化CFSFDP(简称PM-CFSFDP)的聚类算法。利用投影分析方法将数据集进行分区,对各分区进行局部聚类,避免使用全局dc;引入内聚程度衡量参数指导子类合并,实现对数据密度与类间距分布不均匀及多密度峰值的数据集的准确聚类。基于4个典型数据集的仿真结果表明,PM-CFSFDP算法比CFSFDP和AGD-DBSCAN具有更加精确的聚类效果。
The global density threshold dc which is specified without the consideration of spatial distribution of the data will lead to the decrease of clustering quality.Moreover,the data sets with multi-density peaks cannot be clustered accurately.To resolve the above shortcomings,an optimization of CFSFDP algorithm based on projection partition and class merging technique(PMCFSFDP)was proposed.To avoid the use of global dc,the data sets were divided into smaller partitions using the method of projection analysis and the local clustering was performed on them.The sub classes were merged under the guidance of the measure of cohesion.Data sets,which were unevenly distributed and had multi-density peaks,were correctly classified.Results of simulation based on 4typical data sets show that the PM-CFSFDP algorithm is more accurate than CFSFDP and AGD-DBSCAN.