目的 探讨微阵列数据中的先验信息对基于LASSO变量选择方法的影响。方法 设置真实模型后,逐步融合先验信息,采用R、MATLAB软件编程,模拟比较先验信息对LASSO,group LASSO(简称为g LASSO)中的non-overlap group LASSO(简称为nog LASSO)和overlap group LASSO(简称为og LASSO)变量选择的影响。结果 经典的LASSO、og LASSO变量选择方法在处理模拟微阵列数据时具有较好的预测精度(AUCLASSO=0.8915≈AUCog LASSO=0.8923〉AUCnog LASSO=0.8396,MSEnog LASSO=0.1358〉MSEog LASSO=0.0975≈MSELASSO=0.0928),LASSO可解释性最强(平均入选模型基因数分别为21.52、111.95、101.01)。nog LASSO在处理基因通路信息时,当[X295]被错分至第19个通路后,尽管未改变其效应值,但入选模型次数大为减少,预测精度下降较为明显,而og LASSO表现更稳健。结论 融合微阵列数据中的先验信息并未提高基于LASSO变量选择方法的预测性能及效率,经典的LASSO变量选择方法仍为处理微阵列数据的有效方法。
Objective Objective To explore the influence of prior information of microarray data on variable selection based on LASSO. Methods After setting the true model, we incorporated prior information into LASSO, non - overlap group LASSO( nogLASSO for short)and overlap group LASSO( ogLASSO for short) variable selection models and compared the influence by MATLAB or R software. Results LASSO、ogLASSO models seemed to have good prediction accuracy when processing microarray data ( AUCLASSO = 0. 8915 ≈ AUCogLASSO = 0. 8923 〉 AUCnogLASSO = 0. 8396, MSEnogLASSO = 0. 1358 〉 MSEogLASSO≈ 0. 0975 ≈ MSELASSO≈ 0. 0928 ), while only LASSO achieved a interpretable model ( The average of genes selected in the models :21.52,111.95,101.01 respectively). When [ X295 ] was misclassified into 19th pathway, the average of genes selected in the models decreased and the forecast precision declined by nogLASSO model, while ogLASSO model's performance seemed to be more robust. Conclusion Incorporating prior information of microarray data does not improve the prediction performance and efficiency of variable selection based on LASSO, therefore the simple LASSO regression model may be an efficient means to deal with microarray data.