针对线性回归模型的变量选择问题,本文基于遗传算法提出了一种新的Boosting学习方法.该方法对每一训练个体赋予权重,以遗传算法作为Boosting的基学习算法,将带有权重分布的训练集作为遗传算法的输入进行变量选择.同时,根据前一次变量选择效果的好坏更新训练集上的权重分布.重复上述步骤多次,最后以加权融合方式合并多次变量选择的结果.基于模拟和实际数据的试验结果表明,本文新提出的Boosting方法能显著提高传统遗传算法用于变量选择的质量,准确识别出与响应变量相关的协变量,这为线性回归模型的变量选择提供了一种有效的新方法.
With respect to variable selection for linear regression models, this paper proposes a novel Boosting learning method based on genetic algorithm. In the novel algorithm, all train- ing examples are firstly assigned equal weights and a traditional genetic algorithm is adopted as the base learning algorithm of Boosting. Then, the training set associated with a weight distribution is taken as the input of genetic algorithm to do variable selection. Subsequently, the weight distribution is updated according to the quality of the previous variable selection results. Through repeating the above steps for multiple times, the results are then fused via a weighted combination rule. The performance of the proposed Boosting method is investigated on some simulated and real-world data. The experimental results show that our method can significantly improve the variable selection performance of traditional genetic algorithm and accurately identify the relevant variables. Thus, the novel Boosting method can be deemed as an effective technique for handling variable selection problems in linear regression models.