一个微阵列数据集包含了成千上万的基因、相对少量的样本,而在这成千上万的基因中,只有一少部分基因对肿瘤分类是有贡献的,因此,对于肿瘤分类来说,最重要的一个问题就是识别选择出对肿瘤分类最有贡献的基因。为了能有效地进行微阵列基因选择,提出用一个边缘分布模型(marginal distributionmodel,MDM)来描述微阵列数据。该模型不仅能区分基因是否在两样本中差异表达,而且能区分出基因在哪一类样本中表达,从而选择出的基因更具有生物学意义。模拟数据及真实微阵列数据集上的实验结果表明,该方法能有效地进行微阵列基因选择。
A microarray data set contains thousands of genes, a small quantity of samples, and in the tens of thousands of genes, only a few genes that contribute to cancer classification. So, one basic and important question associated with cancer classification is to identify choose the classification of the most contribution cancer genes. In order to effectively microarray gene choice, a marginal dis- tribution model (MDM) is proposed to describe microarray data sets. Based on MDM, it not only can distinguish whether the gene is in two samples differentially expressed genes, and can distinguish which kind of samples expression. Consequently, the selected genes are more informative in the following analysis of the microarray. The results obtained from the simulated and real microarray data sets show that our method is performance well.