随着大规模基因芯片的应用,针对高维度的基因表达数据存在大量无关和冗余特征可能降低分类器性能的问题,提出了一种基于云平台的互信息最大化特征提取(CMI-Selection)方法.Hadoop云计算平台对基因表达数据划分后进行并行计算,同时结合互信息最大化方法对特征进行提取,实现了云计算平台上的特征过滤模型.实验结果表明,基于云平台的互信息最大化特征提取方法能够在保证较高分类精度的情况下,快速提取特征,节省大量时间资源,是一种高效的基因特征提取系统.
With the large-scale application of gene chip,gene expression data with high dimension which exists a large number of irrelevant and redundant features may reduce classifier performance problem.A maximum mutual information feature extraction method based on cloud platforms was proposed.Hadoop cloud computing platform could be a parallel computing after gene expression data segmentation,features was extracted at the same time combined with the maximum mutual information method and the characteristics of cloud computing platform filter model was realized.Simulation experiments show that the maximum mutual information feature extraction method based on the cloud platform can rapid extraction of features in a higher classification accuracy which save a lot of time resources to make a highly efficient gene feature extraction system.