由于数据流数据的动态性、时序性和数据量大等特点使得数据流上的数据挖掘变得更加困难和富有挑战。通过对Squeezer聚类算法的研究分析,并基于此算法提出了一种新的基于聚类的数据流离群数据检测算法O-Squeezer。把数据流看成一个随时间变化的过程,并将其分成许多数据分区,在每个数据块内用改进的O-Squeezer算法挖掘离群数据。理论分析和实验表明,算法可以有效发现数据流中的局部离群数据,算法是可行的。
It is difficult to mine and analyse data streams, because data streams are dynamic, time sequence, have large amount of data. Based on Squeezer cluster algorithm, proposes a new algorithm O- Squeezer to mine ontliers. Data streams will be divided into a lot of data sets,and in each data set the O- Squeezer algorithms will be used to detect outliers. Theoretic analysis and experimental results indicate that the algorithm is effective.