频繁项集挖掘是关联规则挖掘中的关键任务,非常耗费时间.为提高频繁项集的产生效率,提出一种基于倒排索引位运算的深度优先频繁项集挖掘算法(DF-FIMBII).该算法以二进制数组存储项目到事务的倒排索引,通过位运算计算两个项目的支持计数,并采用深度优先搜索策略递归地挖掘不同的k-频繁项集.在chess、mushroom、pumb_star、T40I10D100K等数据集上,对DF-FIMBII、Apriori、ECLAT、BitTableFI、Index-BitTableFI等算法进行了实验比较.实验结果表明,在数据规模不是非常巨大和支持度较小的情况下,无论数据集的稠密程度如何,DF-FIMBII均具有较好的时间优越性.
Frequent itemset mining is a very important procedure in the association rule mining.A Depth First Frequent Itemset Mining Based on Bittable and Inverted Index(DF-FIMBII) is proposed in this paper.DF-FIMBII stores the inverted index of the vertical structure of items to transactions with bittables arrays,and calculates the support counts of two items through the bit operations.Furthermore,DF-FIMBII computes different k-frequent itemsets with a depth first search strategy.We compared the execution time of DF-FIMBII,Apriori,ECLAT,BitTableFI and Index-BitTableFI in four datasets such as chess,mushroom,pumb_star and T40I10D100K.The experiment results show that DF-FIMBII can improve the time consumption when the volume of dataset is not very large and the support value is relative small.