模型组合旨在整合并利用假设空间中多个模型提高学习系统的稳定性和泛化性.针对支持向量机(support vector machine,SVM)模型组合多采用基于样本采样方法构造候选模型集的现状,研究基于正则化路径的SVM模型纽合.首先证明SVM模型组合Lh-风险一致性,给出SVM模型组合基于样本的合理性解释.然后提出正则化路径上的三步式SVM贝叶斯组合方法.利用SVM正则化路径分段线性性质构建初始模型集,并应用平均广义近似交叉验证(generalized approximate cross—validation,GACV)模型集修剪策略获得候选模型集.测试或预测阶段,应用最小近邻法确定输入敏感的最终组合模型集,并实现贝叶斯组合预测.与基于样本采样方法不同,三步式SVM贝叶斯组合方法基于正则化路径在整个样本集上构造模型集,训练过程易于实现,计算效率较高.模型集修剪策略可减小模型集规模,提高计算效率和预测性能.实验结果验证了正则化路径上三步式SVM模型组合的有效性.
Model combination integrates and leverages multiple models in the hypothesis space to improve the reliability and generalization performance of learning systems. In this paper, a novel three-step method for model combination of support vector machines (SVM) based on regularization path is proposed. The Lh-risk consistency for model combination of SVM is defined and proved, which gives the mathematical foundation of the proposed method. Traditionally, model set for model combination of SVM is constructed by data sampling methods. In our method, the model set is constructed with SVM regularization path, which is trained by using the same original training set. First, the initial model set is obtained according to the piecewise linearity of SVM regularization path. Then, the average of GACV is applied to exclude models with poor performance and prune the initial model set. The pruning policy improves not only the computational efficiency of model combination but the generalization performance. In the testing or predicting phase, the input-sensitive combination model set is determined with the minimal neighborhood method, and Bayesian combination is performed. Compared with traditional model combination methods of SVM, the proposed method need not to tune the regularization parameters for each individual SVM model, thus the training procedure can be simplified considerably. Experimental results demonstrate the effectiveness and efficiency of the three-step Bayesian combination of SVM on regularization path.