东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于Spark的改进关联规则算法研究

ISSN号：0258-7998
期刊名称：《电子技术应用》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：太原科技大学电子信息工程学院,山西太原030024
相关基金：国家自然科学基金（41272374）

作者：叶璐, 董增寿

关键词：关联规则, Apriori, MapReduce, HADOOP, SPARK, association rules, Apriori, MapReduce, Hadoop, Spark

中文摘要：

针对关联规则Apriori算法在信息爆炸时代面对海量数据时,其计算周期大、算法效率低等问题,将数据以特定的数据结构进行存储,降低数据遍历次数;在连接操作前进行剪枝操作,并且改变剪枝操作的判定条件;同时将改进算法IApriori与基于内存的大数据并行计算处理框架Apache Spark相结合,提出了一种基于Spark的Apriori改进算法（Spark＋IAprior）。实验结果表明,Spark＋IApriori算法在集群伸缩性和加速比方面都优于Apriori算法。

英文摘要：

Association rules Apriori algorithm have problems with large calculation cycle and low algorithm efficiency faced with huge amounts of data in the era of information explosion, data in a specific storage on the data structure to reduce the data on the number of times past, pruning operation before the items self-joins and changing the terms of judgment have been adopted in the paper, and the algorithm combined with Spark computing framework, an improved algorithm based on the Spark（Spark ＋IApriori） can be put forward. Experimental results show that the Spark＋IApriori algorithm has a good data scalability and speed ratio than Apriori.

同期刊论文项目