Lasso方法与其他特征选择一样,对高维海量或高维小样本数据集的特征选择容易出现计算开销过大或过学习问题(过拟合)。为解决此问题,提出一种改进的Lasso方法:迭代式Lasso方法。迭代式Lasso方法首先将特征集分成足份,对第一份特征子集进行特征提取,将所得特征加入第二份,再对第二份特征进行特征提取;然后将所得特征加入第三份,依次迭代下去,直到第K份,得到最终特征子集。实验表明,迭代式Lasso方法能够很好地对高维海量或高维小样本数据集进行特征选择,是一种有效的特征选择方法。目前,此方法已经很好地应用在高维海量和高维小样本数据的分类或预测模型中。
With a high-dimensional and large dataset, like other feature selection methods, Lasso encounters the problems of large computation and overfitting. To address this issue, this paper proposed an improved Lasso method : iterative Lasso method. Iterative Lasso method first divided the feature set into K copies. Then it selected the features from the first feature subset, put the selected features into the second feature subset, and continued this iteration until up to the Kth feature subset. Experimental results show that the iterative Lasso method can effectively deal with the high-dimensional and large sample datasets.