在文本分类研究中,集成学习是一种提高分类器性能的有效方法.Bagging算法是目前流行的一种集成学习算法.针对Bagging算法弱分类器具有相同权重问题,提出一种改进的Bagging算法.该方法通过对弱分类器分类结果进行可信度计算得到投票权重,应用于Attribute Bagging算法设计了一个中文文本自动分类器.采用kNN作为弱分类器基本模型对Sogou实验室提供的新闻集进行分类.实验表明该算法比Attribute Bagging有更好的分类精度.
In text categorization ensemble learning is one of the methods for improvign the predictive power of classifier.Bagging algorithm is a popular ensemble learning now. Aiming at the problem that weaker classifiers of Bagging have the same weights,an improved Bagging algorithm is developed. The confidence of weaker text classifiers are gained through the result of weaker classifier and the weights of voting is obtained by confidence. The algorithm is applied in Attribute Bagging algorithm to design a Chinese text classifier. Using kNN as the weaker classifier model,which classify news corpus of Sogou lab. The result of experiment shows that this algorithm performs better than Attribute Bagging with more accuracy.