本文针对分布环境的数据挖掘要求,提出了基于贝叶斯网络的分布数据挖掘模型DDMB。论文详细阐述了DDMB中属性多叉树的概念和通过属性多叉树来反映分布环境下各数据集属性总体特征的思想,介绍了基于移动Agent访问分布数据集来构建属性多叉树的方法,详细描述了由属性多叉树生成综合贝叶斯网络的算法,阐述了面向属性多叉树的贝叶斯网络结构学习和参数学习以及属性间依赖系数最小阈值的确定方法。实验结果表明,该模型有效地解决了原有分布环境下贝叶斯网络学习负担重、存储开销大、执行效率低等问题。
The paper presents a distributed data-mining model based on Bayesian DDMB. It proposes the concept of multi- branches tree of attribute and the opinion that using multi-branches tree of attribute to reflect the characteristic of attribute in the distributed dataset. It also introduces the way of building multi-branches tree of attribute based on agents to distributed datasets, then explains the algorithm of Bayesian network for multi-branches tree of attribute, including structure learning and parameters learning. Finally, the paper presents a prototype system P-DDMB of distributed Bayesian network on the basis of Bee-gent. The experimental results showed the DDMB providing high capability and efficiency of distributed business data mining.