针对支持向量机(SVM)在超平面附近进行不平衡数据(imbalanced datasets)分类的不准确性,提出了一种改进SVM-KNN算法,该算法在分类阶段计算测试样本与最优超平面的距离,如果距离差大于给定阈值可直接应用支持向量机分类;如果距离差小于给定阈值,则将所有支持向量都作为测试样本的近邻样本,进行KNN分类。通过对UCI数据集的大量实验表明,该算法在少数类样本的识别率和分类器的整体性能上有明显改善。
Improved KNN-SVM that combined Support Vector Machine(SVM)with K Nearest Neighbor(KNN)is presented to improve the accuracy of imbalanced classification nearby SVM hyperplane. In the class phase,the algorithm computes the distance from the tested sample to the optimal super-plane of SVM in the feature space. If the distance is greater than the given threshold,the tested sample will be classified on SVM;otherwise the SVs from different categories are used as the tested sample of nearest neighbors,the tested sample will be classified on KNN. A large amount of experiments by the UCI dataset show that the algorithm can significantly improve the identification rate of the minority samples and overall classification performance.