东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于随机相遇的频繁项集挖掘方法

ISSN号：1673-9787
期刊名称：《河南理工大学学报：自然科学版》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]河南理工大学计算机科学与技术学院,河南焦作454000
相关基金：国家自然科学基金资助项目（51274088）;河南省科技攻关计划项目（112102210004）;河南省高等学校矿山信息化重点实验室基金资助项目（J1202）.

关键词：数据挖掘, 频繁项集挖掘, 随机相遇算法, 随机相遇, 最小支持度, data mining, frequent item-set mining, random meeting algorithm, random meeting, minimumsupport degree

中文摘要：

频繁项集挖掘是关联规则挖掘的重要内容,而现有的频繁项集挖掘算法在数据库扫描和复杂数据结构构建方面消耗过多的时间,效率较低。为克服现有频繁项集挖掘算法的不足,提出了基于随机相遇的频繁项集挖掘算法。在随机相遇过程中,不断从原始事务集中随机挑选两条事务,将其交集作为新事务集中的元素,通过计算新事务集中最小支持度与原事务集中最小支持度的关系,将在原事务集上的频繁项集挖掘转化为在新事务集上的频繁项集挖掘,算法的时间复杂度和空间复杂度大大降低。由于随机样本蕴含原始数据集的主要统计特性,新事务集具有原事务集的统计特性,通过调整参数,算法在新事物集上挖掘结果的准确度可以得到保证。并利用一个零售超市的交易数据对该算法的有效性进行了测试。测试结果表明,该算法能将挖掘速度提升数十倍,同时挖掘结果的准确度和其它算法相差不大。

英文摘要：

An association rule mining algorithm based on random meeting is proposed to handle the inefficient problem of algorithms. During random meeting, two transactions are selected randomly from an original set, and then the intersection of the transactions is computed and taken as the elements of a new set. The frequent item mining on the original set can be instead on the new set by mapping the min support degree on the two sets. Due to the fact that the new set is smaller than the original one, and the statistical properties of the new set are similar to those of the original one, the complexity of the new algorithm can be reduced while getting the similar accuracy to existing algorithms. The new algorithm was evaluated based on the transactions of a supermarket. The experiment results showed that the new algorithm can improve the speed of frequent item-set mining by more than ten times while achieve the similar accuracy of the computing results compared with other algorithms.

同期刊论文项目