为适应数据集分布形状多样性以及克服数据集密度问题,针对已有算法对离群簇检测效果欠佳的现状,提出了一种基于K-近邻树的离群检测算法KNMOD(outlier detection based on K-nearest neighborhood MST)。算法结合密度与方向因素,提出一种基于K-近邻的不相似性度量,然后带约束切割基于此度量构建的最小生成树从而获得离群点。算法可以有效地检测出局部离群点以及局部离群簇,与LOF、COF、KNN及INFLO算法的对比结果也证实了算法的优越性能。
To adapt to the various distribution shape of data set and overcome the density problem of data set, addressing the issue of unsatisfactory result of existing algorithms on detecting outlying cluster, this paper presented an outlier detection algo- rithm based on K-nearest neighborhood MST. This algorithm focused on data sets of any arbitrary shape and density and could effectively detect local outliers and local outlying cluster. Taking the density and directional factor into consideration, it pro- posed a new dissimilarity measure based on K-nearest neighborhood. Then it built a minimum spanning tree on this K-nearest neighborhood dissimilarity measure, finally progressively constrained the tree to cut to find out the outliers. Compared with LOF, COF, KNN and INFLO algorithm, the results reflect the effectiveness and excellence of this new algorithm.