为了满足大规模数据集快速离群点检测的需要,提出了一种基于分化距离的离群点检测算法,该算法综合考虑了数据对象周围的密度及数据对象间的距离等因素对离群点的影响,通过比较每一对象与其他对象的分化距离来计算其周围的友邻点密度,挖掘出数据集中隐含的离群点。实验表明,该算法能有效地识别离群点,同时能反映出数据对象在数据集中的孤立程度。算法的复杂度较低,适用于大规模数据集快速离群点检测。
In order to meet the need of rapid outlier detection for large-scale data sets,this paper proposerd a differentiation distance-based outlier detection algorithm( DODA) ,which took into account the factors that affected outlier,such as the density of the surrounding data objects and the distance between the objects. By comparing differentiation distance of the each object and other objects to calculate the density of its surrounding neighboring points to discover the hidden outliers data set. Experimental results show that: the algorithm can effectively identify outliers,at the same time,data objects reflect the isolation level in the data set. The algorithm’s complexity is low,it is suitable for quickly outlier detection of large data sets.