位置:成果数据库 > 期刊 > 期刊详情页
基于堆排序的重要关联规则挖掘算法研究
  • ISSN号:1673-629X
  • 期刊名称:《计算机技术与发展》
  • 时间:0
  • 分类:TP301[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
  • 作者机构:北方工业大学计算机学院,北京100144
  • 相关基金:国家自然科学基金资助项目(61371143);北京市自然科学基金项目(4132026)
中文摘要:

现有的关联规则数据挖掘算法或方法中,获取规则的计算时间很大一部分都耗费在关联项目集的扫描、数据库频繁扫描和生成冗余候选频繁项目集中。传统方法虽然得到的挖掘结果比较全面,但并不是所有挖掘结果中的规则都是重要的,以往的方法没有反映出重要的关联规则而使得挖掘结果的有效性不高,不利于得到需要的重要目标结果。针对重要目标的挖掘,提出一种基于堆排序及链表结构的改进Apriori算法。算法通过扫描数据库,统计得到各个项目集在所有事务集中出现的频率,并按照项目集的频率次数进行堆排序。然后根据建立的堆得到所有k阶候选项目集并计算其相对应的支持度,将不同项目集的支持度与预先设定的最小支持度进行比较,若满足最小支持度,就将对应的频繁项目集加入链表中,否则依据剪枝策略剪去这个对应项,将通过连接运算生成的候选k+1阶项目集采用同样的操作可以生成k+1阶频繁项目集。这样可以很大程度上优化算法的频繁项目集的生成过程并加速了重要关联规则的生成过程,从整体上提高了运算速度。

英文摘要:

The existing association rule mining algorithms or methods waste most of their time on the correlation set database scanning, the frequent scanning and the generating of redundant frequent itemsets candidates during their rule acquisition computation. The traditional methods can get more comprehensive mining results, but not all of the rules that came from the mining result are important. Traditional methods don' t reflect the importance of association rules so as to have inefficiency for mining results, and they are not conducive to the gaining of main target results. Aimed at the mining of important goal, an improved Apriori algorithm based on linked list structure and heap sort is proposed. The algorithm scans the whole database to get the frequency of the appearance of each item set among the whole datasets and do the heap sort. Then,according to the established heap,all the k rank candidate sets are obtained and the relative support is calculated. The support degree of different project sets is compared with the minimum support degree. If the minimum support is met,the corresponding frequent item set should be added to the list, or it should be cut according to the shear or pruning strategy. By connecting operation, the candidate k + 1 order item set can be obtained from the generated k order frequent item sets, so to generate the k + 1 order frequent item sets. In this way, the generation of frequent itemsets can be greatly improved, and the mining results of important association rules can be provided, which can improve the speed of operation.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机技术与发展》
  • 中国科技核心期刊
  • 主管单位:陕西省工业和信息化厅
  • 主办单位:陕西省计算机学会
  • 主编:王守智
  • 地址:西安市雁塔路南段99号
  • 邮编:710054
  • 邮箱:ctad@vip.163.com
  • 电话:029-85522163
  • 国际标准刊号:ISSN:1673-629X
  • 国内统一刊号:ISSN:61-1450/TP
  • 邮发代号:52-127
  • 获奖情况:
  • 《CAJ-CD规范》执行优秀期刊
  • 国内外数据库收录:
  • 中国中国科技核心期刊
  • 被引量:21263