在数据流闭频繁项集挖掘过程中,常忽略历史模式对挖掘结果的影响,并采用一种结构来标记闭频繁项集的类型,导致算法的效率不高.为此提出一种挖掘数据流时间窗口中闭频繁项集的方法NEWT-moment.该方法能在单遍扫描数据流事务的条件下完整地记录模式信息.同时,NEWT-moment提出的剪枝方法能很好地降低滑动窗口树F-tree的空间复杂度与闭频繁模式树NEWT-tree的维护代价.此外,该方法提出的时间衰减机制能区分历史和最新模式对挖掘结果的影响;并且,NEWT-tree直接存储闭频繁项集,可随时快速读取闭频繁项集.与T-moment算法相比,算法不需要删除历史数据,不需要记录事务时标,标记各节点,降低了算法的时间和空间复杂度.大量实验结果表明,NEWT-moment有很好的效率和准确性.
When mining closed frequent itemsets over data streams,the available algorithms are often made inefficient due to the fact that they often ignore mode decaying as time passes,and adopt a structure to mark the types of closed frequent itemsets.A method was proposed for mining the closed frequent patterns in the time window of data streams.The pattern of data streams could be completely recorded by scanning the streams only once.And the pruning method of NEWT-moment could reduce the space complexity of sliding window tree and the maintenance cost of the closed frequent patterns tree.To differentiate the historical and the latest patterns,a time decaying model was applied.Additionally,NEWT-tree stores the closed frequent itemsets directly,so they can be read quickly.In contrast with T-moment,and NEWT-moment does not need to delete the historical data,or mark transaction and nodes,which can decrease the time complexity and the space complexity.The experimental results show that the algorithm has good efficiency and accuracy.