类Apriori算法在产生频繁模式时需要多次扫描数据库,并且产生大量的候选集;Free Span和Prefix Span等基于投影数据库的算法在产生频繁模式时会产生大量的投影数据库,占用很多内存空间,这些都造成了很大的冗余。针对以往序列挖掘算法存在的不足,提出一种高效的序列挖掘算法——基于位置信息的序列挖掘算法PBSMA(Position-Based Sequence Mining Algorithm)。PBSMA算法通过记录频繁子序列的位置信息来减少对数据库的扫描,利用位置信息逐渐扩大频繁模式的长度,并且借鉴关联矩阵的思想和Prefix Span算法中前缀的概念,深度优先去寻找更长的关键模式。实验结果证明,无论在时间还是空间上,PBSMA算法都比Prefix Span算法更高效。
Similar to Apriori algorithm in generating frequent patterns need to scan the database several times, and generate a large number of candidate sets. Algorithms based on the projection database, such as FreeSpan and PrefixSpan , in generating frequent patterns will produce a large number of projection database, taking up a lot of memory space, which have caused a lot of redundancy. Aiming at the shortcomings of the previous sequence mining algorithms, an efficient sequence mining algorithm named PBSMA is proposed in this paper. The PBSMA reduces the scanning of the database by recording the position information of frequent subsequences, and gradually enlarges the length of the frequent patterns by using the position information. The algorithm uses the idea of association matrix and the concept of prefix in PrefixSpan algorithm to search for a longer key pattern. The experimental results show that the PBSMA is more efficient than PrefixSpan algorithm both in time and space.