为了更好地识别具有影响因素多、样本量小等特点的疾病诊断的关键特征,辅助临床诊断决策的正确制定,提出了结合弹性网和支持向量机算法的疾病诊断关键特征识别方法。利用弹性网特征选择能力对原始数据集进行降维,得到影响疾病诊断的特征序列;根据特征序列选取关键特征子集,运用支持向量机和10折交叉验证方法获取相应特征子集的分类精度;以UCI中Arrhythmia数据集为例进行测试。结果表明,该方法能够得到较高的分类精度,并可以更有效地对原始样本数据集进行降维,去除影响因素中的冗余和不相关特征,适用于高维低样本量数据集的疾病诊断关键特征识别。
In order to better identify the critical features of disease diagnosis with the characteristics of high dimensional features and small sample sizes,and provide valuable guidance for clinical diagnosis decision making,this paper proposed a method of extracting rules for disease diagnosis based on elastic net and support vector machine( SVM). First,it used the elastic net to reduce the feature space dimension of the original data sets and obtained the feature order according the relationship between the features and disease diagnosis. Then,it tested the classification accuracy of the feature subset selected in the first step by utilizing SVM and 10-fold cross validation. Finally,it gave an example,used Arrhythmia data set from UCI machine learning repository. Compared with other algorithms,the proposed method has higher classification accuracy and is more effective in reducing the irrelevant and useless characteristics.