目的在高维组学变量筛选过程中,当数据发生轻微变化时,变量筛选方法筛选出的变量会发生一定的变化。本文探索如何评价筛选变量的结果是否稳定。方法通过模拟实验,分析对比了HD、SCSR、TD、KI、CW、RCW六种稳定性评价方法的准确性及变异程度,并通过实例结合PLS、svmRFE和RF三种变量筛选方法对SCSR方法进行了考察。结果当变量排序为随机产生时,SCSR、KI和RCW三种方法基本能够在取各种变量数目情况下始终接近于最小值0。对于置换标签和变量值后的数据集,PLS、RF、svmRFE三种方法的稳定性几乎完全相同,SCSR、KI和RCW三种稳定性评价指标在取不同筛选阈值时都达到了最小期望值。在评价指标的稳定性上,HD和SCSR能够保持很小的变异,具有更好的稳健性。结论 SCSR的准确性和稳定性最好,推荐作为稳定性评价指标。
Objective In the process of feature selection,the results of feature selection methods will be diffierent as instances vary slightly. Our research is to study how to measure the stability of the feature preference. Methods We perform simulation experiments to compare the accuracy and variation degree of six measurement of stability: HD,SCSR,TD,KI,CW,RCW.SCSR is further studied by applying PLS,RF,svmRFE to real data. Results When the feature preference is generated randomly,SCSR,KI,RCW are alw ays close to the minimum no matter the number of features remained. When w e apply PLS,RF and svmRFE to the data w hich labels and value of features is permutated and measure stability of results,the stabilities of PLS,RF and svmRFE are almost identical,and SCSR,KI,RCW are still close to the minimum no matter the number of features remained. In the terms of stability of measures themselves,the variation of HD and SCSR are small,this tw o measures have better robustness.Conclusion SCSR performs best in the terms of accuracy and variation degree,and is recommended by us as the measure of stability.