离群点可分为全局离群点和局部离群点.在很多情况下,局部离群点的挖掘比全局离群点的挖掘更有意义.提出了一种基于密度的局部离群点检测算法DLOF.该方法通过引入信息熵用于确定各对象的离群属性,在计算各对象之间的距离时采用加权距离,并给离群属性较大的权重,从而提高离群点检测的准确度.另外,该算法在计算离群因子时,采用了两步优化技术,并对采用这两步优化技术后算法的时间复杂度进行了详细分析.理论分析和实验结果表明了该方法是有效可行的.
With rapid growth of data, data mining becomes more and more important. Detecting outlier is one of the very important data mining techniques, which is to find exceptional objects that deviate from the most rest of the data set. There are two kinds of outliers: global outliers and local outliers. In many scenarios, the detection of local outliers is more valuable than that of global outliers. The LOF algorithm is a very distinguished local outlier detecting algorithm, which assigns each object an outlier-degree value. However, when the outlier-degree value is calculated, the algorithm should equally consider all attributes. In fact, different attributes have different effects. The attributes with more large effects are known as outlier attributes. In this paper, a density-based local outlier detecting algorithm (DLOF) is proposed, which educes outlier attributes of each data object by information entropy. The weighted distance is introduced to calculate the distance of two data object, which those outlier attributes are assigned with bigger weight. So the algorithm improves outlier detection accuracy. In addition, when the local outlier factors are calculated, we present our two improvements of the algorithm and their time complexity analysis. Theoretical analysis and experimental results show that DLOF is efficient and effective.