基于现有的关联规则挖掘算法,提出了一种通过循环迭代增加项为项集后缀的方式产生所有项集的新方法,构造了一种新的数据结构-索引数组,存储所发现的频繁1-项集及其相关信息,以便快速发现项集与事务之间的关系;并提出了一种基于索引数组的频繁项集挖掘新算法。该算法只需扫描数据库两次就能发现所有频繁项集。实验结果表明,该算法可以有效提高频繁项集的挖掘效率。
The paper presented a new approach of increasing item to suffix of itemset reeursively according to the classical association rule mining algorithms. And used a new data structure--index array to store frequent 1-itemset and its correlative information. So the relations of itemsets and transactions were found quickly. Presented a frequent itemsets mining algorithm based on index array and could mine all frequent itemsets through scanning database only twice. The experimental results show that the proposed algorithm outperforms similar state-of-the-art algorithms.