目的 探讨SAM与支持向量机相结合(SAM-SVM)的方法在高维数据中的变量筛选效果.方法采用R语言编程,根据SAM算法,按照变量重要性排序,通过支持向量机分类模型验证其筛选效果.经过多次迭代满足收敛条件时,自动选择"最优"模型.将该方法应用于真实高维数据考核其应用效果,并通过模拟试验验证其有效性.结果对3种疾病的真实高维基因表达数据分别采用上述方法进行变量筛选,均取得了良好的效果,模拟试验也显示使用筛选后的变量再利用支持向量机进行分类判别,具有较高的准确性.结论基于SAM的支持向量机逐步判别方法在解决小样本、非线性及高维问题中表现出许多潜在的优势,可以有效地用于分析高维基因表达数据的特征提取问题.
Objective To make overall evaluation of SAM-SVM applied in feature selection of high dimensional data. Methods According to the SAM algorithm, which was completed withR codes, the variables were sorted by their importance, and then SVM was used to test their predictive ability. The iteration would be stopped until the convergence conditions were satisfied and the optimal model was therefore achieved. SAM-SVM was also applied to real high dimensional data to test its effectiveness and to simulated data to test its validity. Results SAM-SVM showed promising results when applied to three real high dimensional data, and the simulations confirmed the optimal model would achieve a better predictive accuracy. Conclusion SAMSVM showed potential advantages in high dimensional data with small sample size and can be effectively applied to the field of feature selection.