目的探讨有监督的主成分分析及偏Cox回归模型在基因数据生存预测中的应用。方法针对基因数据的协变量个数大于样本例数,以及变量间存在相关性等特点进行模拟研究,并对国际上公开的三个基因数据集进行分析,考察两种模型的预测性能。结果模拟研究显示随着影响生存的基因块的方差的增大以及组内相关系数的增高,两种方法的预测性能变好;随着删失比例的增加,两种方法的预测性能变差。实例分析提示不同的数据集最适方法不同。结论 SuperPC和偏Cox回归都适用于基因数据的生存分析。在模拟中SuperPC比偏Cox回归的表现好,但偏Cox回归计算速度较快。
Objective To explore supervised principal compo- nents and partial least squares Cox regression models and their application in survival prediction from gene expression data. Methods Simulate data based on Cox regression models with number of independent variables was much more than sample size, and analyzed three publicly available data sets with both methods. Results Simulation study showed that both methods were performed better with increasing variances of genes and within-group correlation and poor with increasing censoring proportions. The results from real data analysis indicated optimal method was different for different data- sets. Condusion These two models are appropriate for the survival pre- diction on gene expression data. Although the prediction performance of su- pervised principal component regression is better than partial least squares Cox regression in simulation study, calculation time consuming of the for- mer is less than that of the later,in general.