研究专有的分布式数据挖掘算法是提高分布式数据库下数据分析和挖掘的有效方法.结合Iceberg概念格对于频繁项集精简表达的特性和其集成构造过程可并行化的特点,进而实现分布式全局闭频繁项集的挖掘.面对目前仍然缺乏有关Ice-berg概念格分布式集成构造研究的文献,本文从理论上分析Iceberg概念格叠置集成构造全局Iceberg概念格的局限性,然后论证了基于Iceberg概念格叠置半集成构造全局Iceberg概念格的可行性,进而提出一个基于Iceberg概念格叠置半集成的频繁概念生长分布算法(Frecogd),并且把它应用于同构分布式环境下的全局闭频繁项集挖掘过程中。实验验证了该算法理论的可行性,同时也揭示了该算法的挖掘效能有待进一步的改进与提高.
Researching on exclusive algorithms for distributed data mining is an effective method to improve the efficiency of analyzing and mining data in distributed database scenario.Inspiring by both traits,that Iceberg concept lattice is a concise representation of frequent itemsets and the assembly procedure of Iceberg concept lattice enable being parallelized,we propose a distributed algorithm to mining global closed frequent itemsets with assembly of Iceberg concept lattice.However there is little information available in literature about distributed assembly of Iceberg concept lattice.Due to this situation,the paper firstly analyzes the limitation of the procedure to build global Iceberg lattice based on subposition assembly of Iceberg concept lattice,And secondly proves the feasibility of building the global Iceberg concept lattice by partially subposition assembly.Furthermore,in this paper a distributed algorithm called frequent concept growth distribution(Frecogd) which is based on partially subposition assembly of Iceberg concept lattice has been proposed.Experiments not only prove that the theory about the algorithm is correct but also reveal that the algorithm's efficiency need to be further improved.