东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

挖掘最大频繁项集的并行化策略

ISSN号：1000-7180
期刊名称：《微电子学与计算机》
时间：0
分类：TP338.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]通信指挥学院,湖北武汉430010, [2]华为技术公司,广东深圳518129
相关基金：国家自然科学基金项目（60603069）

关键词：最大频繁项集, 并行化策略, 数据挖掘, maximal frequent itemsets, parallel strategy, data mining

中文摘要：

提出基于因子项集的并行化策略GP以发挥串行算法的剪枝功效。其基本思想是利用因子项集的完全包含关系在处理机之间贪心分配等价类．根据等价类的需要相应地划分和复制数据库记录，使各处理机得以异步计算．达到较好的负载平衡、较高的剪枝效率和较少的数据库记录复制，缩短算法的执行时间。分析和实验表明，基于GP策略的并行算法有较好的可扩展性．其性能优于已有同类算法。

英文摘要：

Mining frequent itemsets is a crucial issue in data mining applications. The complexity of the problem has been shown as NP-hard. Parallel techniques are widely used to improve the efficiency of mining algorithms. A novel parallel strategy for mining maximal frequent itemsets, called GP, is proposed in this paper. The basic idea is to increase the pruning efficiency by distributing work greedily among the processors with gene itemsets＇ complete inclusive relation and selectively duplicates databases on demand of equivalence class for the records in such a way that each processor can compute the frequent itemsets independently. These techniques eliminate the need for synchronization, drastically cutting down the I/O overhead. The analysis and experimental results demonstrate the superb efficiency of the approach in comparison with the previous work.

同期刊论文项目