频繁闭项集的挖掘是发现数据项之间关联规则的一种有效方式.当前以MapReduce模式为基础的云计算平台为解决海量数据中的关联规则挖掘问题提供新的解决思路.文中提出并实现一种基于Hadoop云计算平台的频繁闭项集的并行挖掘算法.该算法主要包括并行计数、构造全局频繁项表、并行挖掘局部频繁闭项集和并行筛选全局频繁闭项集四个步骤.在多个数据集上的实验表明,该方法能较大提高数据挖掘的效率,具有较好的加速比.
Closed frequent itemset mining is an useful way for discovering association rules from data. Cloud computing infrastructure based on MapReduce provides a promising solution to address the problem. A parallel algorithm for mining closed frequent itemset is presented based on the Hadoop cloud computing platform. The method consists of four steps : parallel counting, global F-List constructing, parallel mining of local closed frequent itemset and parallel filtrating of global closed frequent itemset. The experimental results validate the method and show that it is effective with a satisfied speedup.