以格论及位图索引技术为基础给出了一个新的频繁项目集发现算法.1)该算法利用有向图进行一次性数据预处理,在预处理过程中将数据库预先存贮为每个结点都用一个域来记录其支持度的项目集格,从而把复杂的频繁项目集的发现问题转化为图搜索问题,提高了频繁项目集发现过程的效率.2)支持度计算是关联规则发现中I/O及计算开销都非常大,算法引入了位图索引技术,提高了项目集支持度的计算速度.存储完整位图需要较大空间,针对该问题算法对位图进行了分块管理并对其进行了有效的编码压缩;不仅可以有效地对原始位图进行有效压缩,另外也可以在较大程度上提高支持度的计算效率.最后,对算法进行了计算实验与分析.
It is well known that the task of finding frequent itemsets is the key problem in association rules mining. A new algorithm based on the lattice theory and bitmap index for mining frequent itemsets is proposed. 1 ) The algorithm converts the original transaction database to an itemsets-lattice in the preprocessing, where each itemset vertex has a label to save its support. So the complicated task of mining frequent itemsets in the database can be changed to simpler ones of searching vertex in the lattice, which can speeds up greatly the mining process. 2) Support counting in the association rules mining requires a great I/O and computing cost. A bitmap index technique to speed up the counting process is employed. Saving the intact bitmap usually has a big space requirement. Herein each bit vector is partitioned into some blocks, and hence every bit block is encoded as a shorter symbol. Not only the original bitmap is impacted, but also the support counting efficiency is improved efficiently in this way. 3) Experimental and analytical results are presented in the end.