将自组织数据挖掘理论引入贝叶斯分类中,提出一种新颖的贝叶斯分类器结构学习算法。算法将基于依赖分析和评分搜索两种贝叶斯网络结构学习思想相接合,根据互信息测度值选择初始模型,用贝叶斯评分作为筛选中间模型的外准则,能够在不同数据集上完成自适应建模过程,包括选择进入模型的变量、确定具有最优复杂度的模型结构等。在10个UCI数据集上进行分类测试,结果表明,贝叶斯分类器结构学习算法分类器的分类精度要高于常用的朴素贝叶斯、树扩展朴素贝叶斯以及基于K2算法的分类器。进一步地,在信用卡客户分类数据集german上的学习曲线和抗干扰试验还表明,与朴素贝叶斯、树扩展朴素贝叶斯以及K2等分类器相比,贝叶斯分类器结构学习算法分类器具有更加稳定的分类性能和更强的抗干扰能力。
Introducing Self-Organize Data Mining (SODM) to Bayesian classification, a novel structure learning algorithm GMBC-BDE for Bayesian classifiers is proposed. GMBC-BDE combines two structure identification ideas of search & scoring and dependency analysis, selects the original models by mutual information, regards BDE score as external criterion for selecting middle models, and it can realize the adaptive modeling process on different data sets, including selecting the variables which get into middle models and determining the optimal complexity structure automatically. Experiments on 10 UCI data sets show that, the classification accuracy of GMBC-BDE is better than Naive Bayes (NB) , Tree Augmented Naive Bayes (TAN) and the classifiers based on K2 algorithm. Further experiments on “german” data set also show that, compared with NB, TAN and K2, GMBC-BDE classifiers have better stability and stronger anti-disturbance ability.