在分析类Apriori算法存在效率瓶颈的基础上,提出了一个高效改进算法——基于分类树的关联规则挖掘算法.该算法只需要两次访问数据库,把数据库中的数据利用分类树来存储,减少了访问数据库的次数;并且由分类树的全部或部分来求得频繁项目集,减少了求频繁项目集的比较次数.此算法通过结合Apriori和FP—tree两种算法来提高挖掘效率,降低了挖掘算法的时间复杂度和空间复杂度.通过多次试验证明该算法比Apriori及其改良算法的挖掘效率高2到8倍.
Based on the analysis of the bottleneck performance for Apriori-like algorithm, an efficient algorithm for faster mining frequent itemset is proposed which is named the Classification Tree Based Association Rule(CTBAR). The CTBAR scans the database only twice. It adopts classification tree to store the data in database and utilizes all or some of the classification tree to calculate the frequent itemset,which can reduce the times to access database and decrease the comparative times during calculating the frequent itemset. CTBAR improves the efficiency of data mining by combining the two methods : reducing the time and space complexity, ensuring the correctness of the mined results. Several experiments assess the relative performance of the algorithm in comparison with the Apriori and its extended algorithm. The experiment evaluation shows that the algorithm is faster than the other two algorithms by a factor from two to eight.