AdaBoost是最优秀的Boosting算法之一,有着坚实的理论基础,在实践中得到了很好的推广和应用.算法能够将比随机猜测略好的弱分类器提升为分类精度高的强分类器,为学习算法的设计提供了新的思想和新的方法.本文首先介绍Boosting猜想提出以及被证实的过程,在此基础上,引出AdaBoost算法的起源与最初设计思想;接着,介绍AdaBoost算法训练误差与泛化误差分析方法,解释了算法能够提高学习精度的原因;然后,分析了AdaBoost算法的不同理论分析模型,以及从这些模型衍生出的变种算法;之后,介绍AdaBoost算法从二分类到多分类的推广.同时,介绍了AdaBoost及其变种算法在实际问题中的应用情况.本文围绕AdaBoost及其变种算法来介绍在集成学习中有着重要地位的Boosting理论,探讨Boosting理论研究的发展过程以及未来的研究方向,为相关研究人员提供一些有用的线索.最后,对今后研究进行了展望,对于推导更紧致的泛化误差界、多分类问题中的弱分类器条件、更适合多分类问题的损失函数、更精确的迭代停止条件、提高算法抗噪声能力以及从子分类器的多样性角度优化AdaBoost算法等问题值得进一步深入与完善.
AdaBoost is one of the most excellent Boosting algorithms. It has a solid theoretical basis and has made great success in practical applications. AdaBoost can boost a weak learning algorithm with an accuracy slightly better than random guessing into an arbitrarily accurate strong learning algorithm, bringing about a new method and a new design idea to the design of learning algorithm. This paper first introduces how Boosting, just a conjecture when proposed, was proved right, and how this proof led to the origin of AdaBoost algorithm. Second, training and generalization error of AdaBoost are analyzed to explain why AdaBoost can successfully improve the accuracy of a weak learning algorithm. Third, different theoretical models to analyze AdaBoost are given. Meanwhile, many variants derived from these models are presented. Fourth, extensions of binary-class AdaBoost to multiclass AdaBoost are described. Besides, applications of AdaBoost algorithm are also introduced. Finally, interested directions which need to be further studied are discussed. For Boosting theory, these directions include deducing a tighter generalization error bound and figuring out a more precise weak learning condition in a multiclass problem. For AdaBoost, the stopping conditions, the way to enhance anti-noise capability and how to improve the accuracy by optimizing the diversity of the base classifiers, are good questions to be in-depth researched.