现有孤立点检测方法大多数都需要预先设定孤立点个数,若设定不准确将降低孤立点检测的准确性。针对该问题,提出一种基于概率的孤立点检测方法。结合基于密度的DBSCAN算法与中位数求方差的方法,对待检测数据集进行聚类,提取出不包含在任何聚类中的可疑孤立点并进行分析,从而确定最终孤立点。该方法所检测的数据与时间因素线性无关,不必预先设定孤立点个数及聚类数,并且对噪声数据具有较强的抗干扰能力。IRIS测试数据集上的实验结果表明,该方法能够有效地识别孤立点。
Existing outlier detection algorithms most require a predetermined number of outlier. If it is not accurate, it can greatly reduce the accuracy of outlier detection algorithm. Aiming at above problem, a detection method of outlier based on probability is proposed. The detection method combines the DBSCAN algorithm with variance from median algorithm to cluster detection data set, and extracts suspicious outliers which are not belonging to any cluster. These suspicious outliers are detected by the definition of outlier, and outliers are determined. The method are insensitivity with noisy data. The data disposed by this method is irrelative to the time scales. And it does not need to set the number of outlier and cluster. Experimental results on IRIS show that this algorithm can detect outliers effectively.