针对基于密度的带有噪声空间聚类分析(DBSCAN)的不足,融合了领域知识和划分思想,提出了属性维划分的概念,并论证了基于局部簇合并与核心点计算的剪枝原理,最后结合云计算编程模式MapReduce的特点,给出了DBSCAN的优化方法,并在实际道路运输信息系统数据的聚类分析中得到应用验证.实践证明划分后的数据集易于实现并行聚类数据挖掘,文中优化方法优于一般的统计分析方法.
Aiming at the shortcomings of DBSCAN( Density-Based Spatial Clustering of Applications with Noise),this paper presents the concept of the attribute dimension partition by integrating the domain knowledge with the partition idea. Then,the principles of the cluster merging and the pruning computation are demonstrated. Finally,an optimization method of DBSCAN is put forward based on the cloud computing programming model MapReduce,and the optimization method is verified through the data clustering of a real road transport information system. It is found that the dataset partition helps to perform the concurrent computation,and the proposed optimization method is superior to common statistical methods.