贝叶斯分类方法因具有严密的数学理论基础,于是成为一种简单而有效的数据挖掘方法;然而,贝叶斯分类器要求——条件独立性假设和每个属性权值为1,这极大降低了贝叶斯分类器的性能;针对贝叶斯分类器的局限性,文章提出了一种优化的贝叶斯分类算法;文中,首先利用粗糙集理论对待分类数据集进行属性约简,删除冗余属性;然后给出了属性权值的计算方法和公式,目的在于更准确地描述数据集的重要性和相关性;同时,通过weka3.6.2工具,以UCI机器学习数据库中的数据集为测试数据,进行了对比测试;实验结果表明:OBCA具有较高的分类准确率。
Bayesian classification method is a simple and effective data mining method because it's based on a rigorous mathematical theory.However,the performance of Bayesian classifier is reduced by conditional independent assumption and the weight of each attribute value of one.Therefore,this paper puts forward an Optimal Bayes Classification Algorithm in order to solve these shortcomings.Firstly,the data sets to be classified will be removed redundant attribute and attribute reduction with rough set theory.Then,the article gives the calculation methods and formulas of the attribute weight value in order to describe the importance and correlation of data sets more accurately.Performance evaluation of OBCA is done by comparison test in data sets of UCI machine learning database with weka3.6.2.The experimental result shows that it has higher classification accuracy than others.