朴素贝叶斯分类是一种简单高效的方法.但是当属性独立性假设不成立时,有可能导致待测样本类别判断错误;且当待测样本到各类别的概率相同时,无法判断该样本类别,从而影响了它的分类准确率.本文提出基于属性值贡献率的朴素贝叶斯改进算法,利用待测样本的各个属性值在各类别的总贡献率判别该样本的类别.在蘑菇数据实验结果表明,该算法能有效提高分类的准确率.
The Naive Bayesian is a simple and efficient way of classification.When the assumption of attribute independence does not hold,it possibly leads to misjudgment in types of the will-be-tested samples.When the will-be-tested samples have the same probabilities in all categories,it is unable to judge the type of samples.Those affect the accuracy in data's classification.An improved algorithm of Bayesian based on contribution rate of attribute value is proposed in the paper,that is,the type of samples will be judged by the total contribution rate of all attribute value of will-be-tested samples in all categories.The result of mushroom data experiments show that the improved algorithm can effectively improve the accuracy of data classification.