针对基因表达谱信息基因提取的问题,使用Wilcoxon秩和检验方法进行“无关基因”的剔除,基于高低水平基因表达的特点,建立了关于高/低表达水平的双线性回归模型,基于残差分析提取了19个特征基因.使用启发式宽度优先搜索算法搜索最优基因子集,确定结肠癌的基因“标签”,运用支持向量机对分类效果进行检验,分类效果良好.
Based on the problem of extracting information gene expression profiles, we use the wilcoxon rank method to remove "unrelated genes ". Based on the characteristics of high and low levels of gene expression, the bilinear regression model is established on the high/low expression levels. The 19 feature genes is extracted based on the Residual analysis. The heuristic breadth-first search algorithm is used to search for the optimal gene subset, and then to determine the gene tag of colon cancer. There is good effect by the inspection for classification results using support vector machine.