支持向量机在处理不平衡数据集时常常不能取得良好的效果,而基于不同惩罚因子的不平衡支持向量机能够较好地处理这个问题。阐述了支持向量机在不平衡数据集上失效的原因,讨论了不平衡支持向量机的求解算法,提出了一种根据数据集分布的平均密度直接选取惩罚因子的方法,以减少传统交叉验证方法选取参数所需的时间。实验表明,与其他方法相比,这种平均密度方法能够有效提高不平衡支持向量机在不平衡数据集上的识别效果。
Standard SVM often performs poorly on imbalanced datasets,whereas biased-SVM can deal with the problem using two different error costs.This paper explains why SVM fails,discusses how to solve a biased-SVM,and proposes a direct method to determine the error costs,i.e.,"average density",in order to reduce the time needed for their selection via traditional cross validation.Experimental results show that the average density method can efficiently and effectively improve the performance of biased-SVM on imbalanced datasets,better than the other methods for comparison.