支持向量机作为一种有监督分类算法,具有小样本,非线性等独特优势,但其在处理不平衡数据分类时效果不够理想。欠采样是一类常用的数据重构方法,它被广泛用于解决不平衡数据的分类问题,然而,传统的随机欠采样方法受随机性影响,稳定性较差。提出一种改进的欠采样方法,并应用在支持向量机上进行分类对比实验。实验结果表明,相比传统随机欠采样方法,该方法的稳定性更好,且在许多情况下可以提高支持向量机对不平衡数据的分类性能。
As a supervised classifier, Support Vector Machine (SVM) has prominent advantages in solving some problems on petty and nonlinear datasets, but it is unsatisfying in tackling with imbalanced datasets. Random under-sampling has been a widely used method to improve SVM's performance on imbalanced data, but its stability is easily influenced by the nature of randomness. A modified SVM based on under-sampling method is presented to classify imbalanced data. Compared with the random undersampling technique, it is shown through experiments on natural datasets that the new proposed undersampling method is more stable in classifying imbalanced data, and exhibits improved SVM performance in classifying imbalanced data for many cases.