在很多真实世界问题中,不同类别的数据样本往往有显著的不平衡性,即大类的样本远多于小类.对类别不平衡样本进行学习,是目前国内外数据挖掘和机器学习领域的研究热点之一.以往对不平衡样本学习的研究主要针对二分类问题进行,由此针对多分类问题,提出一种基于HDDT决策树集成的多类不平衡学习方法.实验表明,该方法可以有效地对多类不平衡问题进行学习.
In many real world applications,the number of examples from different class is significantly different,which means the number of examples in major class is much larger than that of minor class.Therefore,learning from imbalanced data set has received much attention of machine learning and data mining community.Considering that most of previous research focus on binary class problem,this paper proposes a multi-class imbalance method based on HDDT ensemble.Empirical study shows that the method is effective for multi-class imbalance learning.