多数据源上关联规则挖掘方法,由于各数据节点间相互通信的候选项集数目过于庞大或者挖掘过程需要对数据库进行多次扫描,导致挖掘算法效率不高。研究剪枝概念格(pruned concept laffice,PCL)中概念与频繁项集表示关系,定义剪枝格上的导出频繁项集,设计了一个利用多剪枝概念格从多数据源上挖掘近似所有关联规则的算法UMPCL(union algorithm of multiple pruned concept lattice)。利用一个频繁概念表示一些频繁项集以减少挖掘过程中产生的侯选项集数,使用与全局支持度相等的局部支持度对各子概念格进行剪枝,最后融合、剪枝各子剪枝格并提取全局关联规则。理论分析和实验验证表明该算法是有效的。
Common mining methods of association rules from multiple data sources are inefficient due to many candidate itemsets for communication overhead or too many database scans. Based on the relationship between the pruned concept lattice (PCL) and the representation of frequent itemsets, the derived frequent itemsets of PCL was defined. A union algorithm of multiple pruned concpt (UMPCL), an approximate mining method of association rules in horizontally partitioned databases based on multiple PCLs was proposed. This method employed a frequency concept to represent some similar itemsets to redace the number of frequent itemsets. The same support threshold for pruning sub-concept lattices was amalgamated. Moreover, the pruning lattice was amalgamated from sub-concept lattices. These diminished the size of exchanged messages. Both theoretical analysis and experimention demonstrate that this approach is efficient.