集成学习是通过集成多个基分类器共同决策的机器学习技术,通过不同的样本集训练有差异的基分类器,得到的集成分类器可以有效地提高学习效果。在基分类器的训练过程中,可以通过代价敏感技术和数据采样实现不平衡数据的处理。由于集成学习在不平衡数据分类的优势,针对不平衡数据的集成分类算法得到广泛研究。详细分析了不平衡数据集成分类算法的研究现状,比较了现有算法的差异和各自存在的优点及问题,提出和分析了有待进一步研究的问题。
Ensemble learning by integrating multiple base classifiers that trained different set can effectively improve the clas- sification accuracy. In the base classifier training process,imbalanced data set can be processed by either cost-sensitive or data sampling technology. Due to the advantages of ensemble learning in imbalanced data classification, ensemble algorithms for ira- balanced data classification have been widely research. This paper surveyed the state of the art of imbalanced data ensemble classification algorithms, including the mechanisms and features of major existing learning algorithms, their advantages and dis- advantages, highlighted the open research issues and future research directions.