基于MapReduce实现的Apriori简单并行算法,产生了大量值为1的键/值对,影响了算法效率.提出一种分组统计策略的Apriori并行算法,有效地减少了键/值对的产生.实验结果表明,改进的基于MapReduce并行的Apriori算法在时间性能上有了很大的提升,并且随着集群节点的增加,算法的加速比线性提高.
The simple parallel Apriori with MapReduce generates a great of key/value pairs with value as 1,and affects the efficiency of the algorithm.This paper proposes a parallel Apriori with counting in groups,and it effectively reduces the key/value pairs generation.The experimental results show that the improved parallel Apriori with MapReduce gets a great advance in time performance,and with the number of cluster nodes increases,the speedup increases linearly.