为了有效检测高维数据流中的异常点,提出一种基于角度分布的高维数据流异常点检测(DSOD)算法.运用基于角度分布的方法准确识别高维数据集中的正常点、边界点以及异常点;构造了基于正常集、边界集的小规模数据流型计算集,以降低算法在空间以及时间上的开销;建立了正常集、边界集的更新机制,以解决大数据流的概念转移问题.在真实数据集上的实验结果表明,所提出的DSOD算法的效率高于Simple VOA算法与ABOD算法,并且适用于大数据流上的异常点检测.
To improve outlier detection in high-dimensional data stream, a novel high-dimensional data stream outlier detection (DSOD) algorithm based on angle distribution was proposed. To identify the nor- mal point, border point and outlier accurately, the method of angle distribution-based outlier detection al- gorithm was employed. To reduce the computational complexity, a small-scale calculation set of data stream was established, which is composed of normal set, border set. To solve the problem of concept drift, an updated mechanism for the normal set and border set was developed. The experimental results on real data sets demonstrate that DSOD is more efficient than Simple variance of angles (Simple VOA) and angel-based outlier detection (ABOD) and is very suitable for the outlier detection of large data streams.