针对非平衡数据的半监督分类问题,提出了一种基于Biased-SVM的非平衡半监督分类算法.该方法首先利用初始的标记样本集训练处理不平衡数据的Biased-SVM模型,然后用训练好的Biased-SVM模型为未标记样本加上标签,再把新标记样本加入到初始标记样本集中,重新训练Biased-SVM模型,最后在测试集上进行测试.选取公共数据库里的一些数据集进行实验,首先在两类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体G-mean值的基础上,提高小类的F-value值并具有较高的稳定性;然后在多类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体的EG-mean值的基础上,提高小类识别率并具有较高的稳定性.
In view of the semi-supervised classification problem for imbalaneed data, a new semi-supervised learning algorithm based on Biased-SVM was proposed. The steps of the proposed algorithm were as follows.. Firstly, the Biased-SVM model that could dispose the unbalanced samples data was trained by the initial labeled sample set. Secondly, the trained Biased-SVM model was used to add labels to the unlabeled samples. Thirdly, the new labeled samples were added to the initial labeled sample set, and the Biased-SVM model was retrained. Finally, the classifier performance was tested. The proposed method was tested in several benchmark data sets. First, according to some binary unbalanced data sets, the experimental results showed that the proposed method not only improved the G- mean value and the F-value of the minor class effectively, but also had higher stability when the labeled sample rate was 20%--80%.Second, some multi-class unbalanced data sets were selected, and the experimental results showed that the presented method not only increased the EG-mean value and the precision of the minor class effectively, but also had higher stability when the labeled sample rate was 20%--80%.