树增强朴素贝叶斯分类器继承了朴素贝叶斯分类器计算简单和鲁棒性的特点,同时分类性能常常优于朴素贝叶斯分类器,然而在有连续变量的情况下要求必须进行预离散化.为了更好地表达数据的分布,减少信息损失,有必要考虑混合数据的情况.本文推导混合数据的极大似然函数,提出扩展的树增强朴素贝叶斯分类器,突破必须对连续变量进行预离散化的限制,能够在树增强朴素贝叶斯分类器的框架内处理混合变量的情况.实验测试证明其具有良好的分类精度。
Tree Augmented Naive Bayesian Classifier (TAN) often outperforms Naive Bayesian, yet at the same time maintains the computational simplicity and robustness that characterize Naive Bayesian. But TAN often requires a prior discretization of continuous variables. It is important to- investigate mixed-mode data, in order to represent data distributions well and avoid the problem of information loss. In this paper, the maximum likelihood function of hybrid data is deduced, and a new classifier called Extended Tree Augmented Naive Bayesian Classifier (ETAN) is put forward. The proposed classifier breaks through the restriction that continuous variables must be discretized, and it can deal with hybrid variables in the framework of TAN. Experiments show that this classifier has a good accuracy of classification.