东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于大项集重用的序列模式挖掘算法

ISSN号：1000-1239
期刊名称：《计算机研究与发展》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]国防科学技术大学计算机学院,长沙410073
相关基金：国家自然科学基金项目（60573136）;国家“八六三”高技术研究发展计划基金项目（2003AA142010）

关键词：序列模式挖掘, 位图表示法, 项集扩展, 序列扩展, sequential pattern mining, bitmap representation, itemset-extended, sequence-extended

中文摘要：

在重新定义序列模式的长度、增加了序列模式的挖掘粒度的基础上，提出一种基于大项集重用的序列模式挖掘算法HVSM．该算法采用垂直位图法表示数据库，先横向扩展项集，将挖掘出的所有大项集组成一大序列项集，再纵向扩展序列，将每个一大序列项集作为“集成块”，在挖掘k大序列时重用大项集．并以兄弟节点为种子生成候选大序列，利用1st—TID对支持度进行计数．实验表明，对于大规模事务数据库，该算法有效地提高了挖掘效率．

英文摘要：

A first-horizontally-last-vertically scanning database sequential pattern mining algorithm （HVSM） based on large-itemset reuse is presented in this paper. The algorithm redefines the length of sequential pattern, which increases the granularity of mining sequential pattern. While considering a database as a vertical bitmap, the algorithm first extends the itemset horizontally, and digs out all the large-itemsets which are called one-large-sequence itemset. Then the algorithm extends the sequence vertically, and takes each one-large-sequence itemset as a ＂container＂ for mining k-large-sequence, and generates candidate large sequence by means of taking brother-nodes as child-nodes, and counts the support by recording the 1st-TID. The experiments show that the HVSM can find out frequent sequences faster than the SPAM algorithm for mining the medium-sized and large transaction databases.

同期刊论文项目