基因芯片技术的出现和发展为生物医学领域带来了深远影响,运用分类方法研究其产生的海量数据对癌症的分类及治疗有重要意义.该文提出一种利用熵度量作为指标进行癌症基因表达数据特征提取的方法.首先对基因表达数据进行筛选并计算各个基因的熵,然后提取出熵最大的若干基因作为特征基因,并用支持向量机进行分类.对前列腺癌基因表达数据的留一法以及分组法实验都证明了该方法的有效性.
DNA microarray technology has brought a far-reaching impact on the biomedical field, and it is very significant for using classification method to analyze tumor gene expression data. This paper proposed an algorithm for obtaining informative genes of tumor gene expression data by utilizing entropy as an indicator to. The whole process was done by first putting tumor gene expression data into strata and calculating the entropy of each individual cancer genes. Then, several genes with the highest entropy were selected and classified using SVM. The effectiveness of this algorithm was proven by leaving - one method and group method.