近年来,类不平衡问题已逐渐成为人工智能﹑机器学习和数据挖掘等领域的研究热点,目前已有大量实用有效的方法.然而,近期的研究结果却表明,并非所有的不平衡数据分类任务都是有害的,在无害的任务上采用类不平衡学习算法将很难提高,甚至会降低分类的性能,同时可能大幅度增加训练的时间开销.针对此问题,提出了一种危害预评估策略.该策略采用留一交叉验证法(LOOCV,Leave-one-out cross validation)测试训练集的分类性能,并据此计算一种称为危害测度(HM,Harmful-ness Measure)的新指标,用以量化危害的大小,从而为学习算法的选择提供指导.通过8个类不平衡数据集对所提策略进行了验证,表明该策略是有效和可行的.
In recent years, class imbalance problem has gradually evolved into one of the hotspots in several research fields, including artificial intelligence, machine learning and data mining. At present, many practical and effective methods have been proposed to solve this problem. However, the recent research indicated that not all of the imbalanced classification tasks are harmful and conducting specifically designed class imbalance learning algorithms on those unharmful classification tasks would hardly improve and even degenerate classification performance, meanwhile it is possible to increase training time to a large extent. To solve this problem, we propose a pre-evaluation strategy to estimate the harmfulness of skewed classification tasks. The strategy acquires the classification performance of training set by leave-one-out cross validation, and then uses the obtained performance to calculate a novel index named as Harmfulness Measure (HM) in order to assess the degree of damage. The index would provide helpful information to guide us to select appropriate learning algorithm. The experimental results on eight skewed datasets verified the effectiveness and feasibility of the presented strategy.