目的探讨高维生物学数据的多阶段组合降维策略。方法以微阵列数据的判别分析为例,采用实际数据和模拟数据相结合的方法,提出“初步选维→进→步降维”的两阶段组合降维策略,并与后续的“判别一验证”相结合,形成了“选维→降维→判别→验证”的判别分析思路。以后续判别分析的预测效果、预测结果的稳定性与敏感性等为指标,对2种单→降维(PCA,PLS)方法和4种组合降维方法(PCA+SIR、PCA+SAVE、PLS+SIR和PLS+SAVE)进行了考察。结果从判别模型的预测效果、预测结果的稳定性及敏感性来看,PLS优于PCA,PLS+SIR/SAVE的组合降维效果更佳。结论用t计分法选维,以“PLS+SIR/SAVE”法进行降维的两阶段组合降维策略,对于微阵列数据判别分析,是实用的、可行的。
Objective To explore multi-stage combinational dimension reduction strategy for analyzing high-dimensional data in biology field. Methods Two-stage combinational strategy incorporated in a four- step procedure, i. e. "variable pre-selection→further dimensionality reduction→discrimination→validation", was put forward and applied to publicly available microarray data as well as simulated ones. In this process, the rela- tive performances of six dimension reduction methods, including PCA, PLS ,PCA + SIR,PCA + SAVE ,PLS + SIR and PLS + SAVE, were evalua- ted. Results Considering the prediction quality, the stability of the pre- diction results as well as the sensitivity to the number of genes: ( 1 ) PLS performed was superior to PCA; (2) PLS + SIR or PLS + SAVE performed much better than other methods. Conclusion The results indicate that two stage combinational strategy proposed, i. e. variable pre-selection based on t-scores followed by PLS + SIR or PLS + SAVE,is feasible and practical in the discriminate analysis for microarray data.