针对支持向量机中两类不平衡数据的分离超平面提出一种调整算法.首先用标准的支持向量机对原始数据进行初步训练,产生一个分离超平面的法向量.然后把高维样本投影到该法向量上得到一维数据.最后由投影数据的标准差以及样本容量所提供的信息,给出两类数据惩罚因子比例,再用标准的支持向量机进行第2次训练,从而得到一个新的分离超平面.实验显示该方法的有效性,即在一般情况下能平衡错分率,甚至还能减少错分率.
An adjustment method is proposed for the separation hyperplane of binary-classification imbalanced data. Firstly, the original samples are preliminarily trained by the standard support vector machines, and a normal vector of the separation hyperplane is obtained. Secondly, one-dimensional data are generated by projecting the high dimensional data onto the normal vector. Then, the ratio of the two-class penalty factors is determined based on the information derived from the standard deviation of the projective data and the two-class sample sizes. Finally, a new separation hyperplane is presented by the second training. Experimental results show the efficiency, i. e. , the two error ratios can be balanced and even be decreased generally.