通过对关联规则挖掘基本问题的分析,总结经典挖掘算法Apriori的3点不足,针对不足进行相应改进:1)改变数据库映射方法,避免反复扫描数据库;2)确定非频繁项集,并确保其不与其它项连接,避免产生大量候选项;3)采用交运算,解决候选项集与事物模式匹配阶段耗时过多的问题。此外,为了验证改进算法的有效性,采用水文历史数据进行实验验证。实验结果表明,在支持度与置信度取不同值时,本文提出的改进算法IM-Apriori算法执行时间更短,效率更高。
This paper studies the fundamental problems of mining association rules. Based on the summary of classical mining al- gorithms and the inherent defects of Apriori algorithm, some related improvements are researched. Firstly, in order to avoid scan- ning the database repeatedly, the paper proposes a new method changing the database mapping way. Secondly, with the support of candidate item sets got, each candidate item set should be determined whether it is a frequent item set or not based on the prior knowledge of Apriori algorithm. If the candidate item sets generated by the element of the existing frequent item sets are certainly not frequent item sets, the element is not necessary to connect with others, avoiding producing lots of candidate items, which leads to an optimized connecting step. Lastly, for Apriori algorithm, the intersection operation is introduced to address the prob- lems that it costs too much time to match candidate item sets with transaction patterns. Furthermore, to verify the effectiveness, the optimized algorithm has been applied to the hydrological historical data. The results of the experiments show that it costs shor- ter execution time under different supports and confident levels, gaining higher efficiency.