提出一种非相关线性判别分析(ULDA)结合统计卡方检验(CHI2)的方法用于蛋白质组质谱数据的分类及特征挑选.首先以卡方检验为过滤器去除无类间差别的变量,然后用ULDA进行样本分类与特征筛选.通过对两组数据的分析,最终选择出的特征变量在这两组数据中的特异性分别为98.2%和95.74%,灵敏度均为100%.结果表明本文提出的方法能较好地处理变量数很大的蛋白质组数据,同时表明最后选择的特征变量有可能作为潜在的生物标记物,为相关疾病的早期诊断提供线索.
A uncorrelated linear discriminant analysis (ULDA) combined with Chi-squared (CHI2) method was proposed in this paper and was used to classification and feature selection for proteomic MS data. The method uses CHI2 method as a filter for eliminates the irrelative variables for classification firstly, and then performs ULDA for sample classification and feature selection. After analysis for 2 datasets, the selected variables obtained 98.2% and 95.74% specificity respectively, and 100% sensitivity for both. It can be inferred from the results that it is possible to differentiate between control and cancer samples using the proposed approach, it is also possible that the selected variables can be regard as potential biomarkers that provide clues for disease earlier detection.