面对高维、小样本的基因微阵列数据,有效地提取特征基因成为一项艰巨的任务。在随机特征选择方法的基础上,引入"种子变量"及滚动的排名机制,提出一种基于职业网球选手排名(PTPR)的特征选择算法。用种子变量提高变量搜索过程的选择性,提高搜索效率,同时充分利用历史记录来动态更新种子变量,加快寻优速度。在公共数据库上的测试实验结果表明,PTPR在随机多次独立运行时得到平均50%~80%的相同基因,而MichaDraminski的方法只能保持相同基因在10%~50%左右;收敛性实验表明,PTPR的收敛速度更快且显著;而在5个数据集的独立测试集上的分类率实验表明,PTPR保持较高的分类率,如PTPR得到最高分类率大约为98%、90%、89%、95%、75%,而MichaDraminski方法的最高分类率为96%、89%、85%、95%、70%。同时,与其他典型方法相比,PTPR也得到了较高的分类率。总体上,PTPR算法具有搜索速度快、结果稳定的特点,而且在不同的分类器上都保持了较优的分类率。
Feature selection for high dimension microarray data is a challenging issue.In this paper,we proposed a novel simulation based feature selection method,which adopted the strategy of professional tennis player ranking(PTPR),i.e.,the idea of seed players and dynamic ranking are combined with random searching.The introduction of seed features made the feature selection process more competitive,furthermore,the seed list was dynamically updated and the best current features were always kept and used for the next round of feature selection,therefore,each competition was selective.The proposed algorithms were tested on widely used public datasets.Results showed that averagely about 50%~80% genes were overlapping using PTPR while only about 10%~40% for Michal Draminski's method,and PTPR converged significantly faster at the same time.In terms of classification upon five data sets,PTPR maintained good performance and achieved the highest classification rates of 98%,90%,89%,95% and 75%,respectively,better than those from Michal Draminski's method with 96%,89%,85%,95% and 70%,respectively.It is demonstrated that PTPR is an efficient algorithm for feature selection with faster converge,high stability and good performance in classification.