针对基因表达数据中的高维小样本问题,提出了一种两阶段的识别框架:“偏最小二乘法(PLS)+极大问距准则(MMC)”。该方法首先使用PLS算法提取出带有分类信息的特征,然后使用MMC准则对样本进行分类。在六个公共的基因数据库上与一些常见的基因分类方法相比较,结果显示了该方法对基于基因表达数据的肿瘤分类有效且稳定。
In order to deal with the high-dimensional and small sample size problem of gene expression data, a new two-phrase framework based on PLS and MMC is developed, this procedure involves feature extraction using partial least squares (PLS) and classification using maximal margin criterion (MMC). The proposed method is applied to six public microarray data sets. Experimental results demonstrate that this method is an effective and stable discrimination approach for tumor classification with gene expression data.