支持向量机是重要的机器学习方法之一,已成功解决了许多实际的分类问题。围绕如何提高支持向量机的分类精度与训练效率,以分类过程为主线,主要综述了在训练支持向量机之前不同的特征选取方法与学习策略。在此基础上,比较了不同的特征选取方法SFS,IWSS,IWSSr以及BARS的分类精度,分析了主动学习策略与支持向量机融合后获得的分类器在测试集上的分类精度与正确率/召回率平衡点两个性能指标。实验结果表明,包装方法与过滤方法相结合的特征选取方法能有效提高支持向量机的分类精度和减少训练样本量;在标签数据较少的情况下,主动学习能达到更好的分类精度,而为了达到相同的分类精度,被动学习需要的样本数量必须要达到主动学习的6倍。
Support Vector Machine (SVM) is one of the important machine learning methods and applied successfully to solve many classifying problems in real life. Aiming to improve the classification accuracy and training efficient of SVM, this paper reviews different feature selection algorithms and learning strategies before training SVM according to classification procedure. At the same time, this paper compares the classification accuracy of different feature selection method such as SFS, IWSS, IWSSr and BARS, and analyzes two performance measures on classification accuracy and precision/recall breakeven point when active learning strategy and SVM are combined to obtain a classifier. Experimental results indicate that the accuracy could be significantly improved and the number of training sample could be dramatically reduced by integrating the filtering method into the wrapper method; and when labeled training sample size is too small, active learning obtains better accuracy, however, if passive learning wants to have the same accuracy as active learning, passive learning must have the six times training samples than active learning.