基于微阵列数据的肿瘤诊断方法有望在不久的将来成为临床医学上一种快速且有效的分子层肿瘤诊断方法,但由于微阵列数据存在高维小样本的特点,因而对传统的分类方法提出了挑战,为此研究人员开始关注于性能更好的集成分类算法.针对现有的微阵列数据集成分类算法分类精度不高、计算量过大等问题,提出了一种基于相关性分析的微阵列数据集成分类算法.该算法可以通过计算训练子集间的相关性挑选出差异度最大的一组子集来进行训练,有效地增强了集成中的多样性.应用支持向量机作为基分类器,在急性白血病与结肠癌数据集上的实验结果表明了所提算法的有效性和可行性.同时,测试了算法在不同参数设置下的性能,测试结果为合理的参数设置提供了参考依据.
The tumor diagnosis method based on microarray data will be developed into a fast and effective molecular-level diagnosis method applied in clinic in the near future. However, it is a challenging task for traditional classification approaches due to the characteristics of high dimensionality and small samples for microarray data. Therefore, ensemble classification algorithms with better performance have attracted more researchers. A novel ensemble classification algorithm for microarray data based on correlation analysis is proposed in this paper to solve the problems of low classification accuracy and excessive computation for current ensemble classification algorithms. The proposed algorithm may extract some training subsets which have the most difference between each other by computing their correlation. Therefore, the proposed algorithm could effectively improve diversity among base classifiers. Support vector machine is selected as base classifier in this paper and the experiment results on leukemia dataset and colon tumor dataset show the effectiveness and feasibility of the proposed algorithm. Meanwhile, the performances of the proposed algorithm based on different parameters are tested and the results are helpful for selecting appropriate parameters.