随着基因芯片技术的发展,基因表达实验获得了大量的微阵列相关数据,为人类疾病研究提供了一种全新的手段.然而。由于微阵列数据存在维数高、噪声大及冗余度高等特点,给深入准确地挖掘微阵列数据中所蕴含的知识和信息基因选择带来了极大困难.本文提出一种面向高维微阵列数据的混合特征选择算法,该算法分为两层:第一层使用信噪比方法计算全部基因的信噪比值。根据信噪比值选择指定数目的信息基因,过滤无关基因;第二层使用改进的Lasso方法对第一层得到的信息基因候选子集进行特征选择,剔除冗余基因.实验结果表明本文提出的算法能够选择出数量较少且分类能力较强的信息基因,并且性能稳定、泛化能力强,是一种有效的基因特征选择算法.
With the development of microarray technology, massive microarray data is produced by gene expression experiments, and it provides a new approach for the study of human disease. Due to the characteristics of high dimensionality, much noise and data redun- dancy for microarray data,it is difficult to mine knowledge from microarray data profoundly and accurately, and it also brings enor- mous difficulty to informative genes selection. Therefore,a hybrid feature selection algorithm for high dimensional microarray data is proposed in this paper,which mainly involves two steps. In the first step,Signal Noise Ratio is applied to calculate all genes,and ac- cording to the Signal Noise Ratio value, select informative genes as candidate genes subset and eliminate irrelevant genes. In the second step, an improved method based on Lasso is employed to select informative genes from candidate genes subset, which aims to eliminate the redundant genes. Experimental results show that the proposed algorithm can select fewer genes, and it has better classification abili- ty, stable performance and strong generalization ability. It is an effective genes feature selection algorithm.