情感倾向性分析是情感分析的重要组成部分,是一种按照情感倾向对文本进行分类的任务。微博与传统的评论文本相比更加口语化与符号化,因此对微博进行情感倾向性分析是一个非常有挑战性的任务。基于机器学习的方法是情感倾向性分析最经典的算法,核心是要进行特征的分析和选择,例如词袋特征等。然而,由于中文语言的独特性,前人很多有效的特征都是语言相关的,将其直接用于中文微博效果不佳。在中文微博语料上,还没有学者进行细致的特征工程建设。基于此,文章综合国内外诸多特征,并考虑到中文的独特性,对中文微博的褒贬中倾向性判别特征工程的词、词组、数值和句法特征分别进行了研究,并提出了基于词典规则的情感评分的新特征。最后经过大量实验与分析,得出了可靠的特征组合。实验结果表明,此方法能够明显提高情感倾向性分析的结果。
Sentiment classification,a basic sentiment analysis task,aims to classify a sentiment sentence into positive,negative and neutral.Sentiment analysis on microblog is challenging,which is different from it on common product reviews,due to the characteristics of microblog.Many previous works used machine learning based approaches to solve this task,the core of which is to try and select useful features,for instance,"bag of words".However,these proposed features may not be suitable for Chinese due to linguistic differences.What is more,there is no feature engineering for Chinese microblog in details.In this paper,we do some feature engineering for Chinese microblog sentiment classification,from words,phrases,numbers,syntactic features,and new feature named dictionary-rule based sentiment score,in order to make a better performance beyond the baseline.At last,we obtain reliable feature set through a large number of experiments and analysis.Our approach significantly improves the results of sentiment classification.