东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于磁盘表存储ＦＰ-ＴＲＥＥ的关联规则挖掘算法

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：0
页码：1313-1322
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]江苏大学信息管理与信息系统系,江苏镇江212013, [2]江苏大学计算机科学与通信工程系,江苏镇江212013
相关基金：国家自然科学基金项目（70971067）;国家科技支撑计划基金项目（2010BAl88800）;江苏省基础研究计划基金项目（BK2010331）博士研究生创新计划基金项目（CXlOB-016X）;江苏大学高级人才基金项目（08JDG057）
相关项目：基于DM技术的企业舞弊分析的审计服务系统研究

关键词： FP-TREE, 关联规则, 磁盘存储, 频繁项目集, DTRFP-GROWTH算法, FP-GROWTH算法, 数据挖掘, FP-TREE, association rules disk resident, frequent itemsets, DTRFP-GROWTH, FP-GROWTH, data mining

中文摘要：

随着现实待挖掘数据库规模不断增长，系统可使用的内存成为用FP-GROWTH算法进行关联规则挖掘的瓶颈．为了摆脱内存的束缚，对大规模数据库中的数据进行关联规则挖掘，基于磁盘的关联规则挖掘成为重要的研究方向．对此，改进原始的FP-TREE数据结构，提出了一种新颖的基于磁盘表的DTRFP-GROWTH（disktableresidentFP-TREEgrowth）算法．该算法利用磁盘表存储FP-TREE，降低内存使用，在传统FP-GROWTH算法占用过多内存、挖掘工作无法进行时，以独特的磁盘表存储FP-TREE技术，减少内存使用，能够继续完成挖掘工作，适合空间性能优先的场合．不仅如此，该算法还将关联规则挖掘和关系型数据库整合，克服了基于文件系统相关算法效率较低、开发难度较大等问题．在真实数据集上进行了验证实验以及性能分析．实验结果表明，在内存空间有限的情况下，DTRFP-GROWTH算法是一种有效的基于磁盘的关联规则挖掘算法．

英文摘要：

As the size of the database to be mined is increasing constantly, the size of physical memory available has become a bottleneck when using FP-GROWTH algorithm for association rules mining. So, it is necessary to tackle space scalability by some new algorithms in order to mine association rules in huge database. Nowadays, disk-resident algorithm has become the main target. Therefore, the original data structure of FP-TREE is improved and a novel algorithm called DTRFP-GROWTH （disk table resident FP-TREE growth） is presented. This algorithm uses disk table for storing FP-TREE to decrease memory usage. When the mining works failed for FP-GROWTH using too much memory, DTRFP-GROWTH can continue to mine association rules from huge database by its special skill called disk table resident FP-TREE, which is suitable to occasions of space performance priority. In addition, this algorithm also integrates association rules mining with RDBMS system. It overcomes the problems of some relative solutions based on file system, such as low performance, high difficulty in development, etc. The correctness verification and performance analysis of the above mentioned algorithm are presented. The experiment results on real world data sets also support effectiveness of this algorithm. The study shows that in memory limited occasion, DTRFP-GROWTH is an effective association rules mining algorithm based on disk.

同期刊论文项目