为提高不确定数据频繁模式(FP)挖掘算法的时空效率,提出了基于最大概率的不确定频繁模式挖掘(UFPM-MP)算法。首先,利用事务项集中的最大概率值预估期望支持数;然后,使用该期望支持数与最小期望支持数阈值进行比较,以确定某一项集是否为候选频繁项集,并对候选项集建立子树以递归挖掘频繁模式。实验中,UFPMMP算法与AT-Mine算法进行了对比,并在6个典型的数据集上进行实验验证。实验结果表明,UFPM-MP算法的时空效率得到了提高,稀疏数据集上提高约30%,稠密数据集上的效率提高更为明显(约3~4倍)。预估期望支持数的策略有效地减少了子树和头表项的数量,从而提高了算法的时空效率;且最小期望支持数越小,或需要挖掘的频繁模式越多的时候,算法的时间效率提高越多。
To improve the time and space efficiency of Frequent Pattern (FP) mining algorithm over uncertain dataset, the Uncertain Frequent Pattern Mining based on Max Probability (UFPM-MP) algorithm was proposed. First, the expected support number was estimated using maximum probability of the transaction itemset. Second, by comparing this expected support number to the minimum expected support number threshold, the candidate frequent itemsets were identified. Finally, the corresponding sub-trees were built for recursively mining frequent patterns. The UFPM-MP algorithm was tested on 6 classical datasets against the state-of-the-art algorithm AT (Array based tail node Tree structure)-Mine with positive results ( about 30% improvement for sparse datasets, and 3 - 4 times more efficient for dense datasets). The expected support number estimation strategy effectively reduces the number of sub-trees and items of header table, and improves the algorithm's time and space efficiency; and when the minimum expected support threshold is low or there are lots of potential frequent patterns, time efficiency of the proposed algorithm performs more remarkably.