读者情绪分类具体是指针对某个文本推测读者可能产生的情绪。针对该新问题,目前遇到的主要挑战是标注语料库的匮乏问题。文章提出了一种基于主动学习的读者情绪分类方法,即在已有少量标注样本的基础上,利用主动学习方法挑选优质样本,使得使用尽量少的标注代价获得较好的分类性能。考虑到新闻读者情绪分类可以同时使用新闻文本和评论文本的特殊性,提出了分类器融合分类方法,并在主动学习方面提出了结合不确定性与新闻评论信息量的挑选策略。实验表明,分类器融合方法能够获得比仅用新闻文本更好的分类性能。此外,文章提出的主动学习方法能够有效减小标注规模,在同等标注规模下,获得比随机更佳的分类性能。
Reader emotion classification aims to predict the mood that the reader may have speculated according to some text.For this new issue,the main challenge is the lack of the annotated corpus.In order to alleviate this problem,this paper proposes an active learning approach to reader emotion classification,which is based on a few initial annotated samples,using active learning method to select high-quality sample,making use of the annotating cost as little as possible to get a good classification performance.Considering the specificity that news reader emotion classification,we make use of news text and the comment text and employ classifier combination method to combine them.Moreover,selection strategy combined with uncertainty and news comment information in active learning is proposed.The experiments demonstrate that the method of classifier combination performs better than the method that only using news text.In addition,the proposed active learning method can effectively reduce the dimension scale,and obtain better classification performance than random selection.