多类指数损失函数逐步添加模型(SAMME)是一种多分类的Ada Boost算法,为进一步提升SAMME算法的性能,针对使用加权概率和伪损失对算法的影响进行研究,在此基础上提出了一种基于基分类器对样本有效邻域分类的动态加权Ada Boost算法SAMME.RD。首先,确定是否使用加权概率和伪损失;然后,求出待测样本在训练集中的有效邻域;最后,根据基分类器针对有效邻域的分类结果确定基分类器的加权系数。使用UCI数据集进行验证,实验结果表明:使用真实的错误率计算基分类器加权系数效果更好;在数据类别较少且分布平衡时,使用真实概率进行基分类器筛选效果较好;在数据类别较多且分布不平衡时,使用加权概率进行基分类器筛选效果较好。所提的SAMME.RD算法可以有效提高多分类Ada Boost算法的分类正确率。
Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME) is a multi-class AdaBoost algorithm. To further improve the performance of SAMME, the influence of using weighed error rate and pseudo loss on SAMME algorithm was studied, and a dynamic weighted Adaptive Boosting (AdaBoost) algorithm named SAMME with Resampling and Dynamic weighting (SAMME. RD) algorithm was proposed based on the classification of sample's effective neighborhood area by using the base classifier. Firstly, it was determined that whether to use weighted probability and pseudo loss or not. Then, the effective neighborhood area of sample to be tested in the training set was found out. Finally, the weighted coefficient of the base classifier was determined according to the classification result of the effective neighborhood area based on the base classifier. The experimental results show that, the effect of calculating the weighted coefficient of the base classifier by using real error rate is better. The performance of selecting base classifier by using real probability is better when the dataset has less classes and its distribution is balanced. The performance of selecting base classifier by using weighed probability is better when the dataset has more classes and its distribution is imbalanced. The proposed SAMME. RD algorithm can improve the multi-class classification accuracy of AdaBoost algorithm effectively.