为了提高支持向量机的非平衡数据分类能力,分析了最小二乘支持向量机的本质特征,提出了一种基于中心距离比的非平衡数据分类算法,同时通过修剪边界样本,解决了最小二乘支持向量机缺失稀疏性的问题.在UCI标准数据集上进行的试验表明:该算法能够有效地提高支持向量机对非均衡分布数据的正确性,且该算法在不影响训练精度的前提下,可以得到稀疏解,算法的训练速度也有了一定的提高.
To improve the classification performance of unbalanced datasets, the nature characteristics of sparse least squares support vector machines (LS-SVM) was analyzed and an algorithm based on center distance ratio for the unbalanced samples was proposed. Meanwhile, the problem of sparseness lacking in the least squares support vector machines was solved by pruning the boundary samples. The new algorithm was tested on the UCI datasets. The results indicate that this method can effectively improve the classification accuracy of LS-SVM for the unbalanced samples, the proposed algorithm can properly obtain the sparse solutions without affecting the capacity of classification, and the speed of classifiers is also improved.