情感分类任务旨在识别文本所表达的情感色彩信息(例如,褒或者贬,支持或者反对)。该文提出一种基于情绪词的中文情感分类方法,使用大规模未标记数据和少量情绪词实现情感分类。具体来讲,首先使用情绪词从未标注数据中抽取高正确率的自动标注数据作为训练样本,然后采用半监督学习方法训练分类器进行情感分类。实验表明,该文提出的方法在产品评论与酒店评论两个领域的情感分类任务中取得了较好地分类效果。
Sentiment classification is to distinguish the text between the expressed sentiment categories,such as positive vs.negative or agree vs.disagree.This paper aims to perform unsupervised sentiment classification with only unlabeled data and a small scale of emotion words.In detail,we firstly adopted the emotion words to extract the automatically-labeled samples with high precision,and then used these samples with the unlabeled samples to perform semi-supervised learning for sentiment classification.Experimental results demonstrate that this approach can achieve a good performance for the task of sentiment classification in both product and hotel domains.