随着非平衡分类问题研究的深入,训练数据与测试数据如何划分成为一个值得思考的问题。针对非平衡文本情感分类数据集设计问题,通过下采样方法,对测试数据集设计了平衡与非平衡两种方案,给出了在不同任务需求下,选择相应的实验方案,并对验证分类器分类性能的评价指标进行了讨论。通过在真实的网络评论数据上的实验,验证了这些方案的合理性和适用性。
With the deep researching of the imbalanced classification problems,how to divide the training data and test data has become a worth considering question.Aiming at the imbalanced text sentiment classification problems,this paper has studied both balanced and imbalanced test data with under sampling methods.Discussed in different mission requirements,how to choose a proper scheme and evaluation index to verify the performance of the classifier.The experiments results indicate that proposed schemes are reasonable and applicative on two real network reviews datasets.