支持向量机基于统计学习理论并能较好地解决小样本问题,对许多含有大数量样本的数据库来说,支持向量机并不非常合适.训练样本的数量严重影响训练的速度与支持向量的数量.实验表明,保留训练样本的邻界样本,去除部分非邻界样本可以明显地减少训练样本的数量和支持向量的个数,而泛化能力几乎没有下降.
SVM is based on statistical learning theory and can solve small-sample learning problem better. But for many databases with a huge number of samples, they are not so well suitable for SVM. The number of samples severely influences the training speed and the number of SVs. Experiments show that remaining the border samples and pruning some non-border samples can greatly increase the training speed and reduce the number of SVs while the generalization is almost as good as that of the original training set.