不平衡数据遍布于现实生活中许多重要领域,而标准的分类学习算法应对不平衡问题有明显的性能缺陷.为了解决这一问题,提出一种新的少数类边界合成过采样方法BOS.BOS使用新定义的K广义Tomek连接(简称K连接)概念有效定位边界实例,进而基于少数类的K连接分布实现自适应地少数边界合成过采样.实验结果表明,BOS相比已有的几种典型过采样方法提供更优的接受者操作特性曲线下方面积值(AUC),F值(F-Measure)和几何平均值(G-mean).
The imbalance data are pervasive in a large number of realworld domains of great importance. Traditional classification learning algorithms behave undesirable in imbalanced problem. To address this problem,the authors proposed a new synthetic minority borderline synthetic oversampling method named as BOS. In BOS, a novel K generalized Tomek links concept was used to locate minority class borderline instances, and adaptively generating minority instances were implemented base on the number of their K links. Experimental results show that BOS performed better than some existing typical methods, with more excellent FMeasure, Gmean and the area under the ROC(AUC) values.