移动互联网时代产生海量的简短网络信息,快速、准确地获取这些数据中用户表达的情感信息具有广泛的应用场景。本文考虑人类按顺序阅读文本以获取情感信息的习惯,利用光的折射对此进行模拟,提出一种面向短文本分类的情感折射模型(Sentiment Refraction Model,SRM)。首先,从若干种子情感词出发,利用word2vector及k最近邻分类算法启发式地构建包含喜、怒、哀、乐、惧、恶六类情感的情感词典,认定每类情感词具有同等强度的情感,且这些情感词在不同的上下文中具有一定的情感折射率。其次,针对一条短文本,情感光线以给定的初始入射角向包含若干情感词的文本中传播,经过不同情感词介质的连续折射,情感光线传播方向产生相应的变化,通过情感光线出射角与初始入射角的差值即可判定文本的情感极性。最后,用NLP&CC,COAE等公布的标准数据集对本文方法进行评测,分别与基于情感极性加权求和、朴素贝叶斯以及支持向量机分类方法进行对比。实验结果表明,情感折射模型在不同类型的短文本数据集上均有较好的表现,此外,针对简单词典与扩展词典的情感分类结果对比也证实了情感词典扩展方法的有效性。
As the universal development of mobile Internet, User Generated Content has been experiencing an explo- sive growth. Identifying the exact emotion of these data quickly is significant in variety of applications. Considering human's habit of read textual material sequentially to obtain the emotion, the paper presents a new Sentiment Refrac- tion Model for sentiment analysis of short text inspired by the light refraction phenomenon. Firstly, from few senti- ment seeds words and by using the methods of word embedding and k-nearest neighbors, we built a sentiment dic- tionary containing six kinds of sentiment words, i.e. interest, anger, sadness, enjoyment, fear and disgust. Assuming that each kind of sentiment words have equal sentiment strength and these words may get particular refractive indexes among different contexts. Secondly, for each short text contains one or more sentiment words, the emotional light incomes the emotional words medium with an initial incidence angle. After several consecutive refractions through emotional words medium, the emotion light exists at an angle. According to the difference between the initial incom- ing angle and the outgoing angle, the emotional polarity of the text can be decided. At last, experiments on datasets from NLP&CC and COAE are delivered. Comparing the sentiment analysis results with methods of emotional polar- ity weighted summation, Naive Bayes and support vector machine, the Sentiment Refraction Model outperforms among different kinds of short text datasets. In addition, sentiment analysis result using basic sentiment dictionaryand extended sentiment dictionary confirm the availability of sentiment dictionary extension method.