为进一步提高多关系朴素贝叶斯方法的分类 准确率,分析了已有的剪枝方法,并扩展互信息标准到多关系情况下.基于元组号传播方法和面向元组的统计计数方法,给出了基于扩展互信息标准进行属性选择的方法和步骤,并建立了一种基于扩展互信息的多关系朴素贝叶斯分类器.标准数据集上的实验显示,基于扩展互信息标准进行属性选择,可以在不增加算法时间复杂度的前提下,找到与分类属性最相关的属性,并在仅有极少属性参与分类时,得到较高的分类准确率.Mutagenesis数据集上的实验则显示,这种属性选择可以使多关系问题退化为单关系问题,大大降低了分类代价.
To improve the accuracy of multi-relational Naive Bayesian classifiers, the existing pruning methods were discussed and the attribute filter criterion was upgraded based on mutual information to deal with multi-relational data directly. On the basis of the tuple ID propagation method and counting methods towards tuple, the filter method based on extended mutual information was given, and a multi-relational Naive Bayesian classifier based on mutual information (MI-MRNBC) was implemented. Experimental results show that, in a multi-relational domain, with the help of the attribute filter based on extended mutual information, the classifier can give a better accuracy without the increase of time complexity. In extraordinary instances, the multi-relational classification degenerates into a single relational one, which extremely decreases the cost of classification.