关联规则的研究是数据挖掘中的重要问题,如何高效地发现频繁项集是关联规则研究中的关键问题。根据数据库事务的统计性规律,在最大频繁项集发现算法Apriori及其变种算法的基础上,提出一种新的基于层次的最大频繁项集的发现算法。首先从整体上判断候选集的频繁性,然后在发现最大频繁项集的过程中,通过引入整体性策略、排序策略、最小策略有效地减少了候选集与数据库事务之间的比较次数。实验结果表明,采用该算法处理数据库事务数量大的最大频繁项集的发现任务.其效率相比Aoriori算法有显著的提高。
The research on association rule is an important problem in data mining,how to efficiently discover frequent itemsets is a key problem in association rule research.According to the statistical nature of database transaction,and based on the maximum frequent itemsets discovery algorithm Apriori and its variants,this paper proposes a new algorithm for discovering maximum frequent itemsets that based on layer. Firstly,the algorithm judged the overall frequency of the candidate itemsets, and then through the introduction of the overall strategy,sequencing strategy,the minimum strategy,effectively reduced the comparison times between database transactions and candidates in the process of discovering maximum frequent itemsets. Experimental results show that:when finding maximum frequent itemsets,the efficiency of this algorithm is much better than Apriori in dealing with the task that with large number of database transaction.