参数选择是支持向量分类、回归分析的关键问题之一,在大训练样本条件下,大范围遍历搜索极为耗时.将均匀设计(UD)分别与自调用支持向量回归(SVR)、偏最小二乘回归(PLR)结合,提出了两种将大样本搜索转化为小样本搜索的策略UD-SVR和UD-PLR:在默认搜索范围内由均匀设计产生部分参数组合,每组合对训练集经交叉测试得评价指标(对分类为准确率,对回归为均方误差);以评价指标为目标函数,对部分参数组合形成的小样本,UD-SVR自调用支持向量回归以留一法进行大范围搜索建模,UD-PLR以PLR直接建模,并预测默认范围内所有参数组合;取预测评价指标最优的对应参数组合训练大样本,完成独立预测.对8个基准分类数据集、8个回归数据集的独立预测表明,两种新方法在保证预测精度的同时,大幅度缩短了训练建模时间,为大样本支持向量机参数选择提供了新的有效解决方案,UD-SVR比UD-PLR更具鲁棒性.
Parameters selection is a key problem in Support Vector for regression and classification.Exhaustive search needs lots of time especially for training the large-scale samples.Two new methods named UD-SVR and UD-PLR that the large-scale search is changed into the small-scale search are proposed based on uniform design and self-calling Support Vector Regression、uniform design and partial least-squares.Firstly,the small-scale parameter combinations are abstracted from the default large-scale samples via a mixed uniform design table.Then the evaluation indexes are gotten by training these parameter combinations with SVM.Secondly,a new training dataset including the evaluation indexes and parameter combinations is dealt with two different methods.The new training dataset is trained by SVR using leave-one-out method in UD-SVR and is dealt with PLS in UD-PLR.Then the modeling is gotten and used to predict the default parameter combinations search domains.The best parameters combinations are found based on the best evaluation index.Lastly,the large-scale samples is trained and predicted by the best parameter combination.Experiments on 8 benchmark classification datasets and 8 benchmark regression datasets illustrate that the new methods not only can assure the prediction precision but also can reduce training time markedly.The new methods are efficient solutions to large-scale samples model selection for Support Vector Machine(SVM).UD-SVR is more robust than UD-PLR.