传统的Boosting算法训练出的分类器常会出现过拟合和向多数类偏移.为此,提出一种基于自适应样本注入和特征置换的Boosting学习算法,通过在训练过程中加入人工合成样本,逐渐平衡训练集,并通过合成的样本对分类器学习进行扰动,使分类器选择更多有效的特征,提高了分类器的泛化能力.最后,在两类和多类图片分类问题上对该算法的有效性进行了考察,实验结果表明,该算法能够在样本数很少,且正负样本数量极不均衡的情况下,有效提高booting算法的泛化能力.
Traditional Boosting algorithms tend to overfit and be biased towards the majority class on small and imbalanced training sets. To address this issue, an improved Boosting learning algorithm with adaptive sample injecting and feature knock out was proposed. In the training process, synthetic samples were appended to the original training set to rebalance it and disturb and enhance its generalization ability. The method was tested on both two-class and multi-class image classification problems. Experiment results show that when the number of training samples is small, and the distribution of training set is imbalanced, the proposed method can enhance the generalization performance of Boosting algorithms effectively.