通过对分类数据的深入研究,提出了一种高效的多层关联规则挖掘方法:首先,根据分类数据所在的领域知识构建基于领域知识的项相关性模型DICM(domain knowledge.based item correlation model),并通过该模型对分类数据的项进行层次聚类;然后,基于项的聚类结果对事务数据库进行约简划分;最后,将约简划分后的事务数据库映射至一种压缩的AFOPT树形结构,并通过遍历AFOPT树替代原事务数据库来挖掘频繁项集.由于缩小了事务数据库规模,并采用了压缩的AFOPT结构,所提出的方法有效地节省了算法的I/O时间,极大地提升了多层关联规则的挖掘效率.基于该方法,给出了一种自顶向下的多层关联规则挖掘算法TD-CBP-MLARM和一种自底向上的多层关联规则挖掘算法BU-CBP-MLARM.此外,还将该挖掘方法成功扩展至概化关联规则挖掘领域提出了一种高效的概化关联规则挖掘算法CBP。GARM.通过大量人工随机生成数据的实验证明,所提出的多层和概化关联规则挖掘算法不仅可以确保频繁项集挖掘结果的正确性和完整性,还比现有同类最新算法具有更好的挖掘效率和扩展性.
This paper proposes a idea for mining multiple-level and generalized association rules. First, an item correlation model is set up, based on the domain knowledge and clusters the items according to their correlation. Secondly, the transaction database, based on the item clusters, are reduced which make the transaction database smaller. Finally, the partitioned transaction databases are projected onto a compact structure called AFOPT-tree and find the frequent itemsets from the AFOPT. Based on the proposed idea, this paper proposes a top-down algorithm TD-CBP-MLARM and a bottom-up algorithm BU-CBP-MLARM to mine the multiple-level association rules. Additionally, this paper extends the idea to a generalized mining association rule and gives a new efficient algorithm CBP-GARM. The experiments show that the proposed algorithms not only corrects and completes mining results, but also outperform the well-known and current algorithms in mining effectiveness.