模型选择是支持向量机(support vector machines,SVMs)学习的关键问题.标准支持向量机学习本质上是求解一个凸二次优化问题,求解的时间复杂度为数据规模的立方级,而经典的模型选择方法往往需要多次训练支持向量机,这种模型选择方法对于中等规模的支持向量机学习计算代价已较高,更难以扩展到大规模支持向量机学习.基于高斯核函数的随机傅里叶特征近似,提出一种新的、高效的核支持向量机模型选择方法.首先,利用随机傅里叶特征映射,将无限维隐式特征空间嵌入到一个相对低维的显式随机特征空间,并推导在2个不同的特征空间中分别训练支持向量机所得到的模型的误差上界;然后,以模型误差上界为理论保证,提出随机特征空间中核支持向量机的模型选择方法,应用随机特征空间中的线性支持向量机来逼近核支持向量机,计算模型选择准则的近似值,从而评价所对应的核支持向量机的相对优劣;最后,在标杆数据集上验证所提出方法的可行性和高效性.实验结果表明,所提出的模型选择方法与标准交叉验证方法的测试精度基本相当,但可显著地提高核支持向量机模型选择效率.
Model selection is very critical to support vector machines (SVMs). Standard SVMs typically suffer from cubic time complexity in data size since they solve the convex quadratic programming problems. However, it usually needs to train hundreds/thousands of SVMs for model selection, which is prohibitively time-consuming for medium-scale datasets and very difficult to scale up to large-sca kernel, a nove e problems. In this paper, and efficient approach to random Fourier feature mapping is used to explicit random feature space. An error bo by using random Fourier features to approximate Gaussian model selection of kernel SVMs is proposed. Firstly, the embed the infinite-dimensional implicit feature space into an und between the accurate model obtained by training kernel SVM and the approximate one returned by the linear SVM in the random feature space is derived. Then, in the random feature space, a model selection approach to kernel SVM is presented. Under the guarantee of the model error upper bound, by applying the linear SVMs in the random feature space to approximate the corresponding kernel SVMs, the approximate model selection criterion can be efficiently calculated and used to assess the relative goodness of the corresponding kernel SVMs. Finally, comparison experiments on benchmark datasets for cross validation model selection show the proposed approach can significantly improve the efficiency of model selection for kernel SVMs while guaranteeing test accuracy.