针对目前经典的关联规则挖掘Apriori算法需对数据库多次扫描费时多计算量大,而抽样扫描会造成挖掘精确度下降等问题.采用控制样本频繁项目集的方法,利用频繁项集进行抽样处理,对关联规则挖掘的抽样操作和精度控制进行研究,提出了基于抽样操作的关联规则挖掘算法——HAC算法。理论分析及性能试验结果表明:HAC算法能够有效缩减数据库规模,至少少扫描数据库1次,提高了关联规则挖掘的效率,同时其计算精度不受影响。
In order to reduce the long time spent for scanning the database by using Apriori algorithm, which may descend the mining accuracy, the research on the sample operation and precision control with the help of frequent itemset, especially, the frequent 1-item-set is presented in this paper. The HAC algorithm based on sampling was de- signed. The results in theory and capability experiment indicated that HAC algorithm could decrease the scanning times by at least once, promote the efficiency of mining and improve the computation precision.