本文提出一种基于核SMOTE(Synthetic Minority Over-sampling Technique)的分类方法来处理支持向量机(SVM)在非平衡数据集上的分类问题.其核心思想是首先在特征空间中采用核SMOTE方法对少数类样本进行上采样,然后通过输入空间和特征空间的距离关系寻找所合成样本在输入空间的原像,最后再采用SVM对其进行训练.实验表明,核SMOTE方法所合成的样本质量高于SMOTE算法,从而有效提高SVM在非平衡数据集上的分类效果.
An approach based on kernel SMOTE (Synthetic Minority Over-sampling Technique) to solve classification on imbalance data set by Support Vector Machine (SVM) is presented. The method first oversamples the minority class in feature space by kernel SMOTE algorithm, then the pre-images of the synthetic instances are found based on a distance relation between feature space and input space.Finally,these pre-images are appended to the original data set to train a SVM.Experirnents on real data sets indicate that compared with SMOTE approach, the samples constructed by the kernel SMOTE algorithm have the higher quality. As a result, the effectiveness of classification by SVM on imbalance data set is improved.