滑动窗口是一种对最近一段时间内的数据进行挖掘的有效的技术,本文提出一种基于滑动窗口的流数据频繁项挖掘算法.算法采用了链表队列策略大大简化了算法,提高了挖掘的效率.对于给定的阈值S、误差ε和窗口长度n,算法可以检测在窗口内频度超过Sn的数据流频繁项,且使误差在εn以内.算法的空间复杂度为O(ε-1),对每个数据项的处理和查询时间均为O(1).在此基础上,我们还将该算法进行了扩展,可以通过参数的变化得到不同的流数据频繁项挖掘算法,使得算法的时间和空间复杂度之间得到调节.通过大量的实验证明,本文算法比其它类似算法具有更好的精度以及时间和空间效率.
The sliding window is an effective approach to mine frequent data itmes in the recent period of time.proposed an algorithm for mining frequent items in a stream based on sliding window.The algorithm adopts linked queue strategy which greatly improves the efficiency of the algorithm.Given a threshold S,an error bound ε and the length of the sliding window n,our algorithm can determinately detect the data items within the current window whose frequncy exceeds Sn with an error less than εn using O(ε-1) memory space,and the processing time for each data item and the query time are both O(1).Based on this algorithm,we have proposed a general framework for mining frequent items in data stream based on sliding window.Under this framework,different algorithms can be constructed by changing the parameters which could adjust the time complexity and space cost of the algorithm.Through extensive experiments,we show that our algorithm outperforms other methods in terms of the accuracy,memory requirement,and processing speed.