支持向量机(SVM)算法往往由于分类面过分复杂或过学习而导致其泛化能力降低,现有的最近邻(NN—SVM)或K近邻(K—NN—SVM)方法解决了这类样本问题,但算法时间复杂度高,处理海量样本的能力有限。在NN.SVM算法的基础上引入了网格概念,提出了G—NN-SVM算法,该算法先对空间进行分块,然后在空间块内计算样本距离,找出最近邻,并结合分块序列最小优化算法(SMO)进行了算法实现。实验表明,该方法降低了计算复杂度,它在保持分类精度的同时,提高了训练和分类的速度,并具有较强的泛化能力,从而提高了原NN—SVM算法的海量数据处理能力。
The generalization ability of SVM algorithm is decreased due to the complicated elassified-hyperplane or overlearning. Current methods, such as NN-SVM and K-NN-SVM, can solve this problem caused by samples, but they also have their limitations, such as algorithm time complexity and poor ability for processing massive samples. Hence, the grid is introduced, and G-NN-SVM algorithm based on NN-SVM is presented. After space blocking, this algorithm calculates distance between samples in the blocked space, identifies the nearest neighbor, and realizes the algorithm by combining with chunking SMO algorithm. Experiments show that the new algorithm reduces the computational complexity, improves the speed of training and classification, and maintains the classification accuracy at the same time. It has strong generalization ability, and enhances the mass data processing capability of original NN-SVM.