针对传统的DBSCAN算法只能依靠经验来设置阈值(minPts,Eps)和无法对多密度数据集进行有效聚类的不足,提出了一种可适用于所有密度分布特征的数据集的基于高斯分布的自适应DBSCAN算法.该算法根据数据集的特点寻找最大的聚类效果指数(CEI)来确定minPts,通过Distk图的层次数确定Eps个数并通过高斯分布中的参数估计来确定每个密度层次Eps大小,最后用所求得阈值进行聚类.将该算法与传统的DBSCAN算法分别应用于单密度数据集和多密度数据集,结果显示该算法更有效.
Traditional DBSCAN algorithm usually set the threshold (minPts ,Eps) depending on empirical value, and merely fit for single--density data sets, in allusion to the shortage, the article puts forward a new adaptive DBSCAN algorithm which based on Gaussian distribution and can be used to multi--density data sets. The new algorithm can generate proper threshold according to the data set characteristics. Firstly, to get the minPts accornding to the max CEI value. Then, to confirm the number of Eps according to the curve density level, so that to get Eps data by the Gaussian distribution law. At last, to make clustering for the data set with the minPts and Eps value. Also, to apply the new algorithm and the traditional DtMqCAN algorithm to single-- density and multi-- density data sets, the results show that the new algorithm is more efficient.