在研究已有算法的基础上提出了一种频繁序列挖掘算法IDSG.该算法通过在频繁项(而不是频繁项集,即无需先求出所有频繁项集)间建立关联图,并在垂直数据库表达的基础上,借助简单的时态连接得到频繁序列完全集.整个过程只需扫描原始数据库两遍,有效减少磁盘I/O.另外,优化策略的正确运用,有助于减少候选序列的个数.分析及实验表明,较之同类算法,算法IDSG在效率上有了明显提高.
A new algorithm of frequent sequence mining, IDSG,, is proposed. IDSG finds out the frequent sequences using association graph among frequent items. The whole process only needs to scan the original database twice and it can decreases the disk I/O efficiently. In addition, the properly utilize of optimization strategy is benefit to decrease the number of candidate sequences. Compared with other algorithms of the same kind, analysis and experiment results show that algorithm IDSG is highly improved.