为了解决传统数据挖掘算法在处理海量数据时候的性能瓶颈问题,对FP-Growth算法进行了研究。提出了一种云计算环境下的基于复合链表挖掘的并行FP-Growth算法(PCL-FP)。该算法不构建FP-Tree以及条件FP-Tree,而是利用复合链表来挖掘频繁模式。通过使用不同大小的数据集对改进后的算法进行验证,验证结果表明,提出的PCL-FP算法有效提高了效率,具有很好的灵活性和扩展性,可以广泛的应用到海量数据处理,挖掘频繁项目集。
To solve the performance bottleneck problem of traditional data mining algorithms in dealing with huge amounts of da- ta, the FP-Growth algorithm is studied. A parallel composite linked list-based FP-Growth algorithm in cloud environment (PCL FP) is proposed. Frequent patterns are mined based on composite linked list instead of building FP-Tree and conditional FP- Tree. With different sizes of data sets to validate the improved algorithm, the results show that PCL-FP improves the efficiency and has good flexibility and extensibility. Huge amounts of data and mine frequent itemsets are processed effectively.