在重新定义序列模式的长度、增加了序列模式的挖掘粒度的基础上,提出一种基于大项集重用的序列模式挖掘算法HVSM.该算法采用垂直位图法表示数据库,先横向扩展项集,将挖掘出的所有大项集组成一大序列项集,再纵向扩展序列,将每个一大序列项集作为“集成块”,在挖掘k大序列时重用大项集.并以兄弟节点为种子生成候选大序列,利用1st—TID对支持度进行计数.实验表明,对于大规模事务数据库,该算法有效地提高了挖掘效率.
A first-horizontally-last-vertically scanning database sequential pattern mining algorithm (HVSM) based on large-itemset reuse is presented in this paper. The algorithm redefines the length of sequential pattern, which increases the granularity of mining sequential pattern. While considering a database as a vertical bitmap, the algorithm first extends the itemset horizontally, and digs out all the large-itemsets which are called one-large-sequence itemset. Then the algorithm extends the sequence vertically, and takes each one-large-sequence itemset as a "container" for mining k-large-sequence, and generates candidate large sequence by means of taking brother-nodes as child-nodes, and counts the support by recording the 1st-TID. The experiments show that the HVSM can find out frequent sequences faster than the SPAM algorithm for mining the medium-sized and large transaction databases.