针对基于聚类的离群点检测算法在处理高维数据流时效率和精确度低的问题,提出一种高维数据流的聚类离群点检测(CODHD-Stream)算法.该算法首先采用滑动窗口技术对数据流划分,然后通过属性约简算法对高维数据集降维;其次运用基于距离的信息熵过滤机制的K-means聚类算法将数据集划分成微聚类,并检测微聚类的离群点.通过实验结果分析表明:该算法可以有效提高高维数据流中离群点检测的效率和准确度.
The existing clustering-based outlier detection suffers from low efficiency and precision when dealing with high-dimensional data stream. To relieve this problem,an algorithm of clustering-based outlier detection for high-dimensional data stream( CODHD-Stream) was presented. The algorithm used sliding window technology to divide the data stream. Then dimensions of high-dimensional data streams were reduced by an attribute reduction algorithm. Finally,it divided the data set into a number of micro-clustering to detect outliers contained in the micro-clustering by the K-means method of the distance-based information entropy mechanism. The experimental analyses show that the proposed algorithm can effectively raise the speed and accuracy of outlier detection in high-dimensional data stream.