东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种优化的直接匿名证言协议方案

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：0
页码：-
分类：TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]四川师范大学计算机科学学院,四川成都610101, [2]中国科学院计算技术研究所,北京100190, [3]四川省计算机研究院,四川成都610041
相关基金：国家自然科学基金资助项目（61373162）;四川省科技支撑项目（2014GZ007）.
相关项目：TCG框架下的证明问题研究

关键词：并行化, 数据挖掘, 关联规则, Spark, Apriori, Spark, parallel processing, data mining, association rule, Apriori

中文摘要：

针对传统Apriori算法处理速度和计算资源的瓶颈，以及Hadoop平台上Map-Reduce计算框架不能处理节点失效、不能友好支持迭代计算以及不能基于内存计算等问题，提出了Spark下并行关联规则优化算法。该算法只需两次扫描事务数据库，并充分利用Spark内存计算的RDD存储项集。与传统Apriori算法相比，该算法扫描事务数据库的次数大大降低；与Hadoop下Apriori算法相比，该算法不仅简化计算，支持迭代，而且通过在内存中缓存中间结果减少I/O花销。实验结果表明，该算法可以提高关联规则算法在大数据规模下的挖掘效率。

英文摘要：

In view of the bottleneck of traditional Apriori algorithm in processing speed and computing re-sources, and that Map-Reduce on Hadoop could not handle node failures, friendly support iterative calcu-lation, and calculate based on memory issues ,a parallel association rule optimization algorithm based on Spark was proposed. The optimization algorithm only needed to scan the transaction database twice and it took advantage of Spark’ s RDD storage structure. By comparing with the traditional Apriori and Apriori based on Hadoop, analysis showed that Apriori based on Spark more greatly reduced the number of scan database than that of traditional Apriori, and it used less I/O overhead than Apriori based on Hadoop, because it supported storing temporary results in memory and iterative calculation. Experimental results showed that Apriori based on Spark performed effectively on big data for mining association rules.

同期刊论文项目